Question about new install

smorvan · Mar 21, 2023

Hello all,

I will deploy a Proxmox host in the next few days and I wanted to poll users here about the best hard drive strategy. I have had reasonably good experience (as a hobbyist) with 6.4 in the past on a Dell Precision 7810 paired with 2 Xeons 2698v4.

Here is the host setup:

- Ryzen 7950x (16 core)
- Motherboard MSI Pro B650-Wifi
- 128 Gb of DDR5 Memory split over 4 banks
- 2 x 2 Tb NVME SSD
- 1 x Nvidia 3090 Ti
- 1 x Nvidia Quadro T100
- 1 x 16 Tb Magnetic Hard Drive (Exos Enterprise x18)

I was thinking about the following layout:
Machine 1: For machine learning workloads
- 10 cores
- Ubuntu LTS
- 80 Gb RAM
- 500 Gb Disk space in ZFS partition (see host)
- 500 Gb NVME Space passed directly to VM, for performance
- Nvidia 3090 Ti passed as a PCI device directly to the VM OS (so blacklisted in host),
- This machine is likely to see heavy usage during the day, night etc... Could be reasonably I/O intensive at times.

Machine 2: For conventional office tasks etc..
- 4 cores
- Windows 10
- 32 Gb RAM
- 500 Gb Disk space in ZFS partition (see host)
- Nvidia Quadro T1000 passed as a PCI device directly to the VM OS (so blacklisted in host),
- This machine is likely to see light usage during the day (Office types workload, solidworks - nothing crazy), virtually none at night.

For the Proxmox host:
- 2 cores
- 20 Gb RAM
- OS Installed on Magnetic HD
- ZFS Raidz1 (Mirrored) partition on NVME SSD
- Use leftover space on Magnetic Hard Drive to perform daily backups

Here are my questions:

1) Any use for SLOG / ZIL for the NVME SSDs? If so, can I do that off a partition on either NVME SSD?
2) Any foreseeable risks of 'underprovisionning' the host (RAM / CPU or both, I know ZFS can play hard ball at times)
3) Any better way for accomplishing this?
4) Last time I used Proxmox I could not do live backups, I probably did something wrong. Again, what are the gotchas to allow live backups?
5) Do I need CPU Pinning for the machine learning VM?
6) What about NUMA? Would either machine would benefit from this? Do I need to allocate according to the hardware layout (e.g. 64 Gb and only 8 cores?)?
7) Do I need to disable the onboard graphics provided by the CPU 7950x

TIA!

PS: I will very likely be buying one of the support option with SSH, not sure which one though.

aaron · Mar 21, 2023

smorvan said:
1) Any use for SLOG / ZIL for the NVME SSDs? If so, can I do that off a partition on either NVME SSD?

not really. a dedicated SLOG device is useful to deal with sync writes faster, but you already got NVMEs.

smorvan said:
2) Any foreseeable risks of 'underprovisionning' the host (RAM / CPU or both, I know ZFS can play hard ball at times)

AFAICT your memory plans don't line up that well. The CPU can usually be overprovisioned quite a bit, if they don't want to use it all the time. ML workloads do sound like it will be utilizing the CPU constantly. This is when you don't want to overprovision the CPU that much.

smorvan said:
4) Last time I used Proxmox I could not do live backups, I probably did something wrong. Again, what are the gotchas to allow live backups?

Do you mean, backing up a guest while it is running? Should have even worked back in the 6.x days. For VMs you don't need to worry about much. For Containers, you better store them on a storage that supports snapshots, like ZFS.

smorvan said:
5) Do I need CPU Pinning for the machine learning VM?

Test it out, but it will probably make scheduling easier.

smorvan said:
6) What about NUMA? Would either machine would benefit from this? Do I need to allocate according to the hardware layout (e.g. 64 Gb and only 8 cores?)?

The CPU you use will not have NUMA nodes, so no need to enable it.

Since the overall setup looks like it will be using consumer hardware, please do yourself a favor and use DC NVMEs with power loss protection. Most consumer NVMEs/SSDs tend to not do well under constant write load as you would expect it with a multiple VMs.

smorvan · Mar 21, 2023

Yes, this will be a development machine, to b

aaron said:
AFAICT your memory plans don't line up that well. The CPU can usually be overprovisioned quite a bit, if they don't want to use it all the time. ML workloads do sound like it will be utilizing the CPU constantly. This is when you don't want to overprovision the CPU that much.

Do you mean to say that I should split the memory differently (per channel?)?

aaron said:
Do you mean, backing up a guest while it is running? Should have even worked back in the 6.x days. For VMs you don't need to worry about much. For Containers, you better store them on a storage that supports snapshots, like ZFS.

Ok, I am planning on using ZFS for the OS, and bulk storage for the data.

aaron said:
Test it out, but it will probably make scheduling easier.

Will do - thanks.

aaron said:
Since the overall setup looks like it will be using consumer hardware, please do yourself a favor and use DC NVMEs with power loss protection. Most consumer NVMEs/SSDs tend to not do well under constant write load as you would expect it with a multiple VMs.

Yes, this will be used until we can justify something beefier.

Good point regarding the NVMEs - thanks! I bought Samsungs NVME 980 Pro and 990 Pro, I guess I will have to return them.

aaron · Mar 21, 2023

smorvan said:
Do you mean to say that I should split the memory differently (per channel?)?

80 G + 32 G + 20 G > 128 G

But since the 20 are planned for the host itself, it might end up with a bit less. With fast NVMEs, the ZFS ARC (cache in RAM) isn't that important anyway)

smorvan · Mar 21, 2023

Gotcha - This was a mistake on my end. That total should comes down to 128 Gb. I will shave off 4 Gb from somewhere, I do not want to be running a ballooning helper.

Search

Search

Question about new install

smorvan

Member

aaron

Proxmox Staff Member

smorvan

Member

aaron

Proxmox Staff Member

smorvan

Member