Greetings Proxmox developers and users!
First of all, I would like to thank Proxmox team for PBS - we just started using it within our infrastructure and so far we see great results.
Just to share: our new shiny backup server is built on Supermicro 2U platform, 2 x Xeon 4214R CPUs, 128GB RAM. Backup pool is built using 12 x 14TB HGST SAS drives with ZFS RAIDZ-3, 11 drives used for RAIDZ-3, one SAS drive is hot-spare (we want to see all storage available under one single pool). This pool is also backed by 2 x Intel NVME SSD drives for log and cache devices - all together this allows us to achieve 1.5-2 gigabytes/sec write speeds on ZFS. This PBS is connected using 2 x 10G ethernet (bond) into same network switch where main cluster nodes reside (also using 10G networking). Total 98 terabytes of storage avail.
As a new PBS user I have two questions so far:
1) We just did some initial tests doing backups from our Proxmox cluster. For this purpose I created zfs dataset under main pool, it's called "saspool/testbackup". After that, I added new datastore under PBS and connected Proxmox cluster to that datastore. Everything works as expected, however after finishing with our tests I decided to create a new datastore and remove the test one. There were no problems with removing datastore from PBS GUI, however I cannot destroy the "saspool/testbackup" zfs dataset on the server itself:
obviously something is using this filesystem path and here's my findings:
How do I find out why PBS is still using this dataset and what's the best way to get rid of this lock so that I can destroy this ZFS dataset?
2) Question regarding backup speeds. As written above, we're using 10G network and ZFS pool itself allows to see 1GB/s+ write speeds. However I noticed that we're getting 100-110MiB/s when backup to PBS occurs, which is not bad of course, but could be better. And my next finding was that when I launch backup on 3-node cluster (so 3 backup jobs run in parallel, one from each node) I also see same 100-110MiB/s speeds from each node, making it 300-330 MiB/s total write speed to PBS. This leads me to question if there's some tunable that sets this 100-110MiB/s speed limit?
cheers,
First of all, I would like to thank Proxmox team for PBS - we just started using it within our infrastructure and so far we see great results.
Just to share: our new shiny backup server is built on Supermicro 2U platform, 2 x Xeon 4214R CPUs, 128GB RAM. Backup pool is built using 12 x 14TB HGST SAS drives with ZFS RAIDZ-3, 11 drives used for RAIDZ-3, one SAS drive is hot-spare (we want to see all storage available under one single pool). This pool is also backed by 2 x Intel NVME SSD drives for log and cache devices - all together this allows us to achieve 1.5-2 gigabytes/sec write speeds on ZFS. This PBS is connected using 2 x 10G ethernet (bond) into same network switch where main cluster nodes reside (also using 10G networking). Total 98 terabytes of storage avail.
As a new PBS user I have two questions so far:
1) We just did some initial tests doing backups from our Proxmox cluster. For this purpose I created zfs dataset under main pool, it's called "saspool/testbackup". After that, I added new datastore under PBS and connected Proxmox cluster to that datastore. Everything works as expected, however after finishing with our tests I decided to create a new datastore and remove the test one. There were no problems with removing datastore from PBS GUI, however I cannot destroy the "saspool/testbackup" zfs dataset on the server itself:
# zfs destroy saspool/testbackup
cannot unmount '/saspool/testbackup': unmount failed
#
obviously something is using this filesystem path and here's my findings:
# lsof -n /saspool/testbackup
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
proxmox-b 3386 backup 20u REG 0,55 0 3 /saspool/testbackup/.lock
#
How do I find out why PBS is still using this dataset and what's the best way to get rid of this lock so that I can destroy this ZFS dataset?
2) Question regarding backup speeds. As written above, we're using 10G network and ZFS pool itself allows to see 1GB/s+ write speeds. However I noticed that we're getting 100-110MiB/s when backup to PBS occurs, which is not bad of course, but could be better. And my next finding was that when I launch backup on 3-node cluster (so 3 backup jobs run in parallel, one from each node) I also see same 100-110MiB/s speeds from each node, making it 300-330 MiB/s total write speed to PBS. This leads me to question if there's some tunable that sets this 100-110MiB/s speed limit?
cheers,