Suggestions for HW-setup

Mar 14, 2019
2
0
6
44
Hi,

looking for some suggestions for the system we want to improve on.
We want to run PVE on a dedicated server running below described services. Up until now we use Hetzner and are pretty OK with service/costs.
This is what we use for our data analytics stack:

Base system (Debian+Proxmox) running:
* Router-VM (fw/routing services): connecting file/web-server to the world, offering ssh/(open)VPN access for development.
* ETL: some daily/weekly loads performed over night (spread out), usually idle during the day.
* Fileserver: ETL server gets its files from here as well as different clients, all sorts of access-methods)
* BI Host: Some BI-Tool like Tableau, Microstrategy, Jaspersoft etc, no direct connection but linked to web-server and database.
* Webserver, main connection f. BI-Tool <-> World
* 0-3 hosts offering some additional services, like different web-services, etc. Trivia: bc some of the BI-tools need more fancy testing-features, don't "taint" the BI-host/main webserver with these. Usually low load/requirements.

DB-system:
DWH running PostgreSQL on (minimal)CentOS/Debian. Connection only allowed to base-system via ssh/postgres. Typical installation would probably be running 4databases with ~400GB total. Largest database with ~160GB where data is stored mostly in 1-3large tables (200M++ rows).
We do have a host with 256GB RAM where we use as much as 30%-RAM and 6cores for one huge query that still runs ~1hour. Usual load: typical DWH-queries where long time-spans are summarized across a large amount of attributes.
NOTE: co-locate in the same rack (&connect directly) if possible.

General usage pattern is as follows:
* Users push their files to the fileserver
* Mixed ETL/ELT jobs grab these files and load them into the DB.
* BI-Host scheduled updates to refresh data for dashboards, reports etc after ETL finishes. There are still a couple of ad-hoc queries but most of the time queries hit (BI-application)cached data. Updates daily/weekly, depending on the client-project.
* Webserver (+other tooling-server) connecting to the BI-Host and primary source of info for customers.

Requirements:
Users: usually 5-20, max concurrent usage <=50 users.
Downtime: we can live with 1day downtime, max. 3days acceptable if ultra-rare (lt once in 3y). This mostly affects backup-strategy, the 3days would be worst-setup-time if the host fails completely and it would take some time to "acknowledge" that (and a new one would need 1-2days before it's ready. Gladly not speaking from experience).
Backups: Our current setup does not entail a hot standby, we backup all the databases (pg_dump) corresponding to the load-frequency (to different locations, also off-site). Similar strategy with the VMs, only router-vm/tooling/webserver will have low(er) frequency as there are no "changing" parts. Also backup all relevant configs for the DB/base-system as well as a clear "install-script" using these.
Storage: DB roughly betwen 300-450GB while 300-600GB for the base system.
CPU: Single thread performance not negligible as some of the tasks won't run in parallel (for both systems).
* Base system: 4-8cores for the BI-Host, the more the merrier. 4+ for web, router and base. Overprovisioning for ETL/fileserver/add.services useful as the load-time does not overlap.
* DB system: as much as possible
RAM: Both as much as possible (going for 128GB(each)/256GB(total) seem to align with Hetzner-prices at least)
* Base system: BI needs some 60GB plus. ETL observed to eat up to 10GB and the rest probably takes some 20GB as well. 128GB should suffice.
* DB system: Would need to start fine-grained measurements but I'd say 128GB should be enough, probably less.

Issue at hand:
With the costs for set-up relatively small we usually swap hosts from time to time to leverage monthly based fees in terms of new HW-generation and minimize risk of worn out HW.
As of now the PX-series of Hetzner was relatively cheap and allowed us to slap in 4*480GB-SSDs along with a BBU-HW-Raid. We used two identical hosts for the two systems above.
With pricing shifted and the advent of NVME I wonder if still going for these kind of set-ups is worth it. We might also include the DB host in the base system and only have to deal with one host and have everything else running inside a container.
There are a couple of choices to be made and I wondered if somebody already has experience in running a similar setup and cares to share some info. Most pressing questions (based on services from Hetzner, if anyone can suggest similarly prized services I'd gladly consider them too. My google-fu didn't turn up with something vastly superior and they never failed us up until now):
* Storage:
** All NVME and zfs (included in pve) with raidz?
*** If so, seperate SSD for pve-system?
** NVME(2pieces) zil/l2arc alongside spinning HDDs or even SSDs? The 2 NVMEs would be costly and too large (min 980GB) for this kind of usage, also include the pve-system there?
* RAM:
** DB/Base will use up at least 128GB, more would most likely be best.
** With this ~1TB storage, how much would I need for zfs? Found a few
* CPU: only question would be whether some configuration was not supported (threadripper, epyc?), but I doubt that as the HW is not exactly fresh.

As said, I'm thankful for any thoughts on this.
Thanks, Thomas.
 
Hi,

some nodes in general.
** All NVME and zfs (included in pve) with raidz?
SQLDB and RaidZ are no good combination. It is the same as Raid 5/6.
PostgreSQL and ZFS work well together but you have to tune it. There are several guides on how to tune ZFS with PostgreSQL.
*** If so, seperate SSD for pve-system?
If the OS has its own storage the nod of the VM storage will reduce.
Also, it is easier to handle it in case of a disaster.
** NVME(2pieces) zil/l2arc alongside spinning HDDs or even SSDs? The 2 NVMEs would be costly and too large (min 980GB) for this kind of usage, also include the pve-system there?
Generally spoken the ZIL ARC must be much faster than the data store to have a notable performance improvement.
So if you have enterprise SSD the answer is definitely no.
Anyway NVME is not a performance class it also depends on the NAND on this disk.
What I mean is there are NVMe on the marked what are not as fast as some SSD.
With this ~1TB storage, how much would I need for zfs? Found a few
For the index 1GB.
For the L2arc 128 bytes for each block. This is only needed if you have an L2arc cache.
ZFS will grab default up to 50% of the Memory. It is tuneable[1].
CPU: only question would be whether some configuration was not supported (threadripper, epyc?), but I doubt that as the HW is not exactly fresh.
We here use both.

1.) https://pve.proxmox.com/wiki/ZFS:_Tips_and_Tricks
 
Thank you for the info provided, this does help by clarifying some things.

SQLDB and RaidZ are no good combination. It is the same as Raid 5/6.
PostgreSQL and ZFS work well together but you have to tune it. There are several guides on how to tune ZFS with PostgreSQL.
Thx for pointing this out, I had read too many stuff and things got blurry at some point. I had "2ndQuadrant Postgres ZFS"-article[1] in mind when writing raidZ but it is really a striped mirror/raid10.

If the OS has its own storage the nod of the VM storage will reduce.
Also, it is easier to handle it in case of a disaster.
Generally spoken the ZIL ARC must be much faster than the data store to have a notable performance improvement.
So if you have enterprise SSD the answer is definitely no.
Anyway NVME is not a performance class it also depends on the NAND on this disk.
What I mean is there are NVMe on the marked what are not as fast as some SSD.
Both of these are labeled as "Datacenter Editions". My naive assumption was that these drives were each "good samples of their respective division" but I should check that..

One of the reasons of my asking here was if somebody runs a similar stack and can bring his/her experience into the mix like "hands off XY" or "runs fine for us". As this is the first time I really have some choice wrt to system-specs (f.e. one-big-host vs multiple-small-instances), I'm a bit dumbfounded as to where to start planning this and how much can actually be planned.
Guess the usual response is: you gotta test. But of course I want to limit the amount of testing specs bc we would have to pay each run both in setup and from a time perspective. On the other hand "acceptable" in both speed and complexity is already fine.

Again, for the help so far.

[1] No external link allowed, search engines love this article though and place it prominently.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!