Hello Proxmox Community,
I'm writing to get some advice on a Proxmox cluster project involving two main homeservers and a third site for backup storage. My aim is to enhance individual service availability by using already available hardware and high availability (HA). Both main sites host various VMs that are accessible via the internet, including Vaultwarden and game servers, as well as some VMs for internal use like Pi-hole and Home Assistant. The setup should also include a backup strategy to ensure resilience against internet connection failures and node failures.
I have some specific questions:
1. Using ZFS, is there an issue with SATA+USB RAID-1?
I know it will slow down to the slowest drive involved, but will that impact the performance of PVE? If the VMs are on the NVMe, will a degraded PVE slow down the VMs because of that?
2. Will it be an issue to have VMs and PVE installation on a single drive? I've read both arguments for separating them and for putting everything on a single drive, but not much about the reasons behind each approach. My take is that having everything on one RAID-1 drive offers the highest availability, correct?
3. Could the PVE host system also be backed up to the HDD RAID for when the drive fails? Is it beneficial to do so, considering PVE can be quickly reinstalled on a new drive anyway?
4. I've read that slow data degradation can occur due to ZFS's deduplication. Is this issue mitigated by replicating backups across two sites? Are these errors automatically corrected by ZFS, or do they require active monitoring and intervention?
Below is a detailed explanation of my current setup and plans. Any insights, suggestions, or hints about potential problems that could arise would be greatly appreciated.
Proxmox Cluster Overview
The goal is to connect two Proxmox homeservers located at two main sites and introduce a third-party site for backup storage. Both main sites host different VMs that connect to the internet (e.g., Vaultwarden and game servers). By leveraging replication and high availability (HA), we aim to improve overall service availability.
Cluster Setup: Family-Cluster
The cluster comprises three locations.
* there may be a fourth site, but that will be only connected via the VPN and may use it for backups.
Network Connectivity
All locations are interconnected via VPN (Fritz!Box to Fritz!Box VPN).
Applications in Use
High Availability
The cluster aims to ensure resilience against two main scenarios:
1. Internet Connection Failure:
- Issues like VPN breakdowns, loss of external accessibility, or ISP problems (e.g., Vodafone/Telekom).
- High Availability (HA) setup for critical services like Vaultwarden, game servers, and potentially Nextcloud.
2. Node Failure:
- Backup plans should be robust enough to handle the complete loss of a location → Backed up all data on at least two sites
- If possible: securing "local" services and VMs like Home Assistant via external nodes (increased latency but prioritizing availability).
Backup Strategy
Details on Single Site (S1)
- VMs: Pi-hole, Home Assistant, Nextcloud, Vaultwarden, Paperless
- Hardware: Intel NUC i5-5250U, 16GB RAM
- Storage Options/Connections:
- 1x M.2 PCIe4 NVMe
- 2x SATA
- 4x USB3 (5Gbit/s)
Proposed Storage Distribution:
- PVE and PBS will be installed on the NVMe.
- Backup via one external USB drive (SSD or HDD).
- Backups don't need RAID here, as they are mirrored to another site (S3) and stored there on a RAID-1.
Issues:
- When the NVMe drive fails, the system is completely offline.
Details on Single Site (S2)
- VMs: Pi-hole, Home Assistant, FoundryVTT
- Hardware: NucBox G3 Intel N100, 16GB RAM (to be upgraded to 32GB)
- Storage Options/Connections:
- 1x M.2 PCIe NVMe
- 1x M.2 PCIe Wifi
- 1x M.2 SATA
- 4x USB3 (5Gbit/s)
Proposed Storage Distribution:
- Plan to purchase an adapter card M.2 E-key to A-key to get a second NVMe drive.
- PVE and PBS will be installed on the M.2 SATA (plus maybe RAID-1 with one external USB SSD).
- Both NVMe drives will be configured in RAID-1 for VMs.
- Backup via one external USB drive (SSD or HDD).
Issues:
- When the SATA drive fails, the system is completely offline.
- There may be issues booting from the NVMe via adapter in the WiFi slot, will that be an issue for any possible Raid?
Details on Single Site (S3)
- VMs: Pi-hole
- Hardware: ASUS PRIME N100I-D D4, 32GB RAM
- Storage Options/Connections:
- 1x M.2 PCIe NVMe
- 1x M.2 PCIe Wifi
- 1x SATA
- 2x USB3 (10Gbit/s)
- 2x USB3 (5Gbit/s)
- 1x PCIe 3.0x1 with extension card to internal 4x SATA 6Gbit
Proposed Storage Distribution:
- NVMe: PVE and PBS installation + VMs (if they have to be migrated).
- 2x SATA 8TB HDD with RAID-1 for backup of all data (VMs from all sites, Nextcloud storage, Time Machine backups, etc.).
I'm writing to get some advice on a Proxmox cluster project involving two main homeservers and a third site for backup storage. My aim is to enhance individual service availability by using already available hardware and high availability (HA). Both main sites host various VMs that are accessible via the internet, including Vaultwarden and game servers, as well as some VMs for internal use like Pi-hole and Home Assistant. The setup should also include a backup strategy to ensure resilience against internet connection failures and node failures.
I have some specific questions:
1. Using ZFS, is there an issue with SATA+USB RAID-1?
I know it will slow down to the slowest drive involved, but will that impact the performance of PVE? If the VMs are on the NVMe, will a degraded PVE slow down the VMs because of that?
2. Will it be an issue to have VMs and PVE installation on a single drive? I've read both arguments for separating them and for putting everything on a single drive, but not much about the reasons behind each approach. My take is that having everything on one RAID-1 drive offers the highest availability, correct?
3. Could the PVE host system also be backed up to the HDD RAID for when the drive fails? Is it beneficial to do so, considering PVE can be quickly reinstalled on a new drive anyway?
4. I've read that slow data degradation can occur due to ZFS's deduplication. Is this issue mitigated by replicating backups across two sites? Are these errors automatically corrected by ZFS, or do they require active monitoring and intervention?
Below is a detailed explanation of my current setup and plans. Any insights, suggestions, or hints about potential problems that could arise would be greatly appreciated.
Proxmox Cluster Overview
The goal is to connect two Proxmox homeservers located at two main sites and introduce a third-party site for backup storage. Both main sites host different VMs that connect to the internet (e.g., Vaultwarden and game servers). By leveraging replication and high availability (HA), we aim to improve overall service availability.
Cluster Setup: Family-Cluster
The cluster comprises three locations.
Location | Network | Node | Internet Connection |
---|---|---|---|
S1 | 192.168.10.0/24 | pve1 | 100/5 (cable) |
S2 | 192.168.20.0/24 | pve2 | 1000/50 (cable) |
S3 | 192.168.30.0/24 | pve3 | 100/50 (fiber) |
Network Connectivity
All locations are interconnected via VPN (Fritz!Box to Fritz!Box VPN).
Applications in Use
- Local VMs:
- Pi-hole
- Home Assistant
- Office computers (for remote access using thin clients)
- OMV (or another NAS for Time Machine backups from Win/Mac) - Services Accessible via VPN:
- Nextcloud - Services Accessible from the Internet:
- Vaultwarden
- FoundryVTT
- Various other game servers
High Availability
The cluster aims to ensure resilience against two main scenarios:
1. Internet Connection Failure:
- Issues like VPN breakdowns, loss of external accessibility, or ISP problems (e.g., Vodafone/Telekom).
- High Availability (HA) setup for critical services like Vaultwarden, game servers, and potentially Nextcloud.
2. Node Failure:
- Backup plans should be robust enough to handle the complete loss of a location → Backed up all data on at least two sites
- If possible: securing "local" services and VMs like Home Assistant via external nodes (increased latency but prioritizing availability).
Backup Strategy
- PBS Installation: Parallel to PVE installation on the same drive (SSD or NVMe).
- Local Backups: Individual backups on SSD and/or HDD (single or RAID-1 configuration).
- Centralized Backups:
- Aggregated backups from all nodes expected at S3 (2x 8TB HDD RAID-1).
- Aim for deduplication using ZFS on all drives (considering upload limitations).
Details on Single Site (S1)
- VMs: Pi-hole, Home Assistant, Nextcloud, Vaultwarden, Paperless
- Hardware: Intel NUC i5-5250U, 16GB RAM
- Storage Options/Connections:
- 1x M.2 PCIe4 NVMe
- 2x SATA
- 4x USB3 (5Gbit/s)
Proposed Storage Distribution:
- PVE and PBS will be installed on the NVMe.
- Backup via one external USB drive (SSD or HDD).
- Backups don't need RAID here, as they are mirrored to another site (S3) and stored there on a RAID-1.
Issues:
- When the NVMe drive fails, the system is completely offline.
Details on Single Site (S2)
- VMs: Pi-hole, Home Assistant, FoundryVTT
- Hardware: NucBox G3 Intel N100, 16GB RAM (to be upgraded to 32GB)
- Storage Options/Connections:
- 1x M.2 PCIe NVMe
- 1x M.2 PCIe Wifi
- 1x M.2 SATA
- 4x USB3 (5Gbit/s)
Proposed Storage Distribution:
- Plan to purchase an adapter card M.2 E-key to A-key to get a second NVMe drive.
- PVE and PBS will be installed on the M.2 SATA (plus maybe RAID-1 with one external USB SSD).
- Both NVMe drives will be configured in RAID-1 for VMs.
- Backup via one external USB drive (SSD or HDD).
Issues:
- When the SATA drive fails, the system is completely offline.
- There may be issues booting from the NVMe via adapter in the WiFi slot, will that be an issue for any possible Raid?
Details on Single Site (S3)
- VMs: Pi-hole
- Hardware: ASUS PRIME N100I-D D4, 32GB RAM
- Storage Options/Connections:
- 1x M.2 PCIe NVMe
- 1x M.2 PCIe Wifi
- 1x SATA
- 2x USB3 (10Gbit/s)
- 2x USB3 (5Gbit/s)
- 1x PCIe 3.0x1 with extension card to internal 4x SATA 6Gbit
Proposed Storage Distribution:
- NVMe: PVE and PBS installation + VMs (if they have to be migrated).
- 2x SATA 8TB HDD with RAID-1 for backup of all data (VMs from all sites, Nextcloud storage, Time Machine backups, etc.).