Help Needed to Design Ideal ZFS Setup on PVE

pvpaulo

Member
Jun 15, 2022
26
0
6
Good morning everyone,


I could use your help in designing the best setup for my current Proxmox environment.


Here's the current scenario:


  • I'm running Proxmox VE with 4 x 1TB disks.
  • I've created a ZFS pool using RAID10, resulting in a pool named VMS with ~2TB of usable space.
  • I'm virtualizing a Proxmox Backup Server (PBS) inside this PVE.
  • I’d like to use the VMS pool to allocate virtual disks for PBS and also for other virtual machines, such as one that provides Samba file sharing.

However, I'm facing issues:


  • The PBS virtual disk stored in the VMS pool is showing errors.
  • Another VM using the same pool (Samba server) is reporting I/O errors.
  • The zpool status command reports the VMS pool is degraded.

Given this situation, what would be the best approach to using ZFS in this setup? Should I reconsider how I'm allocating virtual disks from the pool, or is it more likely a hardware or configuration issue with ZFS?


Any suggestions or insights would be greatly appreciated.


Thank you in advance!
 
PBS is itself also running ZFS, so ZFS-on-ZFS is often a performance killer and many people in the forum will advice against it.

Do you see the i/o errors on the pve host in zpool status?
 
Also having PBS as a VM within Proxmox is not really a reliable backup - this, coupled with the fact that all this resides within one ZFS pool is basically a useless setup.
 
  • I'm running Proxmox VE with 4 x 1TB disks.
  • I've created a ZFS pool using RAID10, resulting in a pool named VMS with ~2TB of usable space.
  • The zpool status command reports the VMS pool is degraded.
Maybe show the output of zpool status for the VMS pool (in CODE-tags)? Are you using SMR or QLC drives? Maybe we can find out the actual problem with your physical zpool and suggest improvements?
 
Since the environment is still being created,
I have already removed the disks and formatted them.
Now I am studying a better way to do this operation.
Since I have 4 1TB disks.
When running RAID 10, I will have 2TB of storage.

However, I want to add PBS and other services on internal PVE VMs.

I thought about creating a ZFS pool using 2 1TB disks for the local VMs.

And add the 2 1TB disks in Passthrough on the PBS VM and make the ZFS pool only inside the PBS VM.

Would this be a good approach?
 
We had trouble with PBS until we decided to install it on another host altogether (CPU,RAM,HDD) It was nothing but trouble until it was isolated (hardware wise) from the VM's and drives said VM's are on, its backing up. You can install it in Proxmox, but just let that host be for PBS purposes only and it'll work fine.
EDIT: we have a SAN that all backups go to, so there is no local HDD contention during backups.
 
Last edited:
I'm virtualizing a Proxmox Backup Server (PBS) inside this PVE.
Whats the point of doing this?

  • Another VM using the same pool (Samba server) is reporting I/O errors.
  • The zpool status command reports the VMS pool is degraded.
Fix your filesystem problems first. Your zfs layout isnt the issue, but the subsystem is somehow defective.

Once we past the "dont store your payload and backups on the same storage," If you really want to use PBS nested on top of pve, there's nothing stopping you using native storage from either the host (nfs, virtiofs, of bind mount container) or external storage so you dont end up with write amplification.
 
Since the environment is still being created,
Exactly the reason why you have received the initial posts advising you of a structure change in your setup.

Think about what you expect to do if your PVE node fails & needs a rebuild/reinstall. Where will your VM backups be?
 
  • Like
Reactions: Curt Hall
Since I have 4 1TB disks.
When running RAID 10, I will have 2TB of storage.
Whether this is a good approach depends a on the type of drives (which you apparently don't want to share) and not the ZFS pool layout.

However, I want to add PBS and other services on internal PVE VMs.
Also having PBS as a VM within Proxmox is not really a reliable backup - this, coupled with the fact that all this resides within one ZFS pool is basically a useless setup.
Please make sure to keep additional copies of the data and at least one in another place. PBS has remote sync which is perfect for that.

I thought about creating a ZFS pool using 2 1TB disks for the local VMs.

And add the 2 1TB disks in Passthrough on the PBS VM and make the ZFS pool only inside the PBS VM.
That also possible (but please don't use SMR or QLC drives). Alternatively, install PBS as a container (instead of a VM) and you can use the PVE storage directly (and avoid ZFS on ZFS). Or install PBS without ZFS (and avoid ZFS on ZFS).

Whats the point of doing this?
I also use local PBS (as container) on each PVE for quick backups (and restore as long as the hardware does not fail) and sync between them regularly (which in in different places with a slower connection).

Fix your filesystem problems first. Your zfs layout isnt the issue, but the subsystem is somehow defective.
Since the environment is still being created,
I have already removed the disks and formatted them.
That won't fix the underlying issues and I also think you should investigate this first.
 
Estou sempre enfrentando esse problema:

1750419738213.png

1750419768108.png


Agora tenho um pool ZFS de 16 TB com ZFS.

Aloquei 1 TB para uma VM SFTP interna.

E aloquei 8 TB para minha VM do servidor de backup proxmox.

No entanto, sempre que escrevo algo como um backup na VM do servidor de backup proxmox, a VM SFTP para de funcionar e gera um erro de E/S e o sistema de arquivos fica corrompido.

Estou usando ext4 em ambos os sistemas de arquivos da VM.

Qual poderia ser esse problema?

Poderia ser o tamanho do disco alocado para a VM de backup do proxmox?

Poderia ser um problema físico em um dos discos que usei para criar o pool ZFS?

Ou poderia ser alguma limitação na configuração do ZFS?

Este é meu ambiente atual:

1750419854593.png

1750419891903.png

O ambiente é simples, mas estou enfrentando esses problemas .

1750420305233.png
 
Last edited:
Poderia ser um problema físico em um dos discos que usei para criar o pool ZFS?
As multiple people already said in this thread: yes.
It can also be a cable/connector, controller or memory problem instead of a drive media problem but it's typically a hardware issue. (Your other screenshot of the error messages on the console is unreadable for me.)
 
Last edited:
#dmesg

I will forward the written records here:
[245623.366930] Buffer I/O error on dev zd0, logical block 512, async page read
[245623.451311] Buffer I/O error on dev zd0p1, logical block 2052, async page read
[245623.451320] Buffer I/O error on dev zd0p1, logical block 2049, async page read
[245623.451374] Buffer I/O error on dev zd0p1, logical block 2048, async page read
[245623.451375] Buffer I/O error on dev zd0p1, logical block 2050, async page read
[245623.451384] Buffer I/O error on dev zd0p1, logical block 2053, async page read
[245623.451388] Buffer I/O error on dev zd0p1, logical block 2051, async page read
[245623.452894] Buffer I/O error on dev zd0p1, logical block 2055, async page read



#zpool status -v
pool: VM
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
config:

NAME STATE READ WRITE CKSUM
VM ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
scsi-36842b2b045dde9002fbe0c431468ba0c ONLINE 0 0 0
scsi-36842b2b045dde9002fbe0c5815a36501 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
scsi-36842b2b045dde9002fbe0c6e16fbab1b ONLINE 0 0 205
scsi-36842b2b045dde9002fbe0c7f17ffe33b ONLINE 0 0 205

errors: Permanent errors have been detected in the following files:

VM/vm-101-disk-0:<0x1>
VM/vm-100-disk-0:<0x1>