Proxmox Backup Server

devis · Jul 5, 2023

Hello, I am using Proxmox VE with Proxmox Backup Server
My backup process is built on an internal 10 gigabit network, I also use ZFS for Proxmox Backup Server with 32GB memory allocation for ARC
Storage capacity 47 TB of which 40TB is occupied
RAM 64GB
CPU: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
SATA Disks for Storage (Seagate Exos 7E8) 12x8TB
VM 821 Groups, 1681 Snapshots

However, I am seeing the following problems:
1. I use a scheduler to verify all backups, however, this process is extremely long and can take up to a week, during this process there are problems with communication and getting backups when I try to load a list of backups of any machine on the hypervisor, the errors that I observe are "Connection timeout (596)" or "Communication failure (0)"
2. During the operation of the backup verification service, the proxmox backup server interface also slows down

Can you please tell me how to fix these problems?

devis · Jul 5, 2023

~# pveversion --verbose
proxmox-ve: 7.3-1 (running kernel: 5.15.85-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.4-1
pve-kernel-helper: 7.3-3
pve-kernel-5.15.104-1-pve: 5.15.104-1
pve-kernel-5.15.85-1-pve: 5.15.85-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-4
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

proxmox-backup-manager versions --verbose
proxmox-backup 2.4-1 running kernel: 5.15.107-2-pve
proxmox-backup-server 2.4.2-2 running version: 2.4.2
pve-kernel-5.15 7.4-4
pve-kernel-5.15.107-2-pve 5.15.107-2
pve-kernel-5.15.74-1-pve 5.15.74-1
ifupdown2 3.1.0-1+pmx4
libjs-extjs 7.0.0-1
proxmox-backup-docs 2.4.2-1
proxmox-backup-client 2.4.2-1
proxmox-mail-forward 0.1.1-1
proxmox-mini-journalreader 1.2-1
proxmox-offline-mirror-helper unknown
proxmox-widget-toolkit 3.7.3
pve-xtermjs 4.16.0-2
smartmontools 7.2-pve3
zfsutils-linux 2.1.11-pve1

devis · Jul 5, 2023

ZFS Subsystem Report Wed Jul 05 21:10:31 2023
Linux 5.15.107-2-pve 2.1.11-pve1
Machine: bkp5 (x86_64) 2.1.11-pve1

ARC status: HEALTHY
Memory throttle count: 0

ARC size (current): 100.2 % 30.1 GiB
Target size (adaptive): 100.0 % 30.0 GiB
Min size (hard limit): 100.0 % 30.0 GiB
Max size (high water): 1:1 30.0 GiB
Most Frequently Used (MFU) cache size: 61.2 % 7.0 GiB
Most Recently Used (MRU) cache size: 38.8 % 4.4 GiB
Metadata cache size (hard limit): 100.0 % 30.0 GiB
Metadata cache size (current): 94.5 % 28.3 GiB
Dnode cache size (hard limit): 100.0 % 30.0 GiB
Dnode cache size (current): 35.3 % 10.6 GiB

ARC hash breakdown:
Elements max: 3.5M
Elements current: 26.5 % 920.0k
Collisions: 98.1M
Chain max: 7
Chains: 46.8k

ARC misc:
Deleted: 624.1M
Mutex misses: 5.6M
Eviction skips: 99.0M
Eviction skips due to L2 writes: 0
L2 cached evictions: 0 Bytes
L2 eligible evictions: 69.8 TiB
L2 eligible MFU evictions: 2.9 % 2.1 TiB
L2 eligible MRU evictions: 97.1 % 67.7 TiB
L2 ineligible evictions: 4.4 TiB

ARC total accesses (hits + misses): 2.1G
Cache hit ratio: 81.1 % 1.7G
Cache miss ratio: 18.9 % 388.0M
Actual hit ratio (MFU + MRU hits): 81.1 % 1.7G
Data demand efficiency: 48.1 % 378.3M
Data prefetch efficiency: < 0.1 % 89.4M

Cache hits by cache type:
Most frequently used (MFU): 68.1 % 1.1G
Most recently used (MRU): 31.9 % 532.0M
Most frequently used (MFU) ghost: 4.0 % 66.7M
Most recently used (MRU) ghost: 1.3 % 22.4M

Cache hits by data type:
Demand data: 10.9 % 182.1M
Prefetch data: < 0.1 % 7.9k
Demand metadata: 89.1 % 1.5G
Prefetch metadata: < 0.1 % 549.6k

Cache misses by data type:
Demand data: 50.6 % 196.2M
Prefetch data: 23.0 % 89.4M
Demand metadata: 26.4 % 102.3M
Prefetch metadata: < 0.1 % 85.0k

DMU prefetch efficiency: 27.0M
Hit ratio: 49.1 % 13.3M
Miss ratio: 50.9 % 13.8M

L2ARC not detected, skipping section

Solaris Porting Layer (SPL):
spl_hostid 0
spl_hostid_path /etc/hostid
spl_kmem_alloc_max 1048576
spl_kmem_alloc_warn 65536
spl_kmem_cache_kmem_threads 4
spl_kmem_cache_magazine_size 0
spl_kmem_cache_max_size 32
spl_kmem_cache_obj_per_slab 8
spl_kmem_cache_reclaim 0
spl_kmem_cache_slab_limit 16384
spl_max_show_tasks 512
spl_panic_halt 0
spl_schedule_hrtimeout_slack_us 0
spl_taskq_kick 0
spl_taskq_thread_bind 0
spl_taskq_thread_dynamic 1
spl_taskq_thread_priority 1
spl_taskq_thread_sequential 4

VDEV cache disabled, skipping section

ZIL committed transactions: 15.2M
Commit requests: 170.2k
Flushes to stable storage: 170.1k
Transactions to SLOG storage pool: 0 Bytes 0
Transactions to non-SLOG storage pool: 6.7 GiB 184.0k

_gabriel · Jul 6, 2023

zfs raidz1 ? z2 ?
imo it's excepted on hdd only.

Dunuin · Jul 6, 2023

Jup. You get what you pay for. HDDs are probably bottlenecking as they can't handle the random reads fast enough.

devis · Jul 6, 2023

_gabriel said:
zfs raidz1 ? z2 ?
imo it's excepted on hdd only.

Bash:

zpool status
  pool: Backups-Storage1
 state: ONLINE
config:

    NAME                                  STATE     READ WRITE CKSUM
    Backups-Storage1                      ONLINE       0     0     0
      mirror-0                            ONLINE       0     0     0
        ata-ST8000NM000A-2KE101_WKD2KGQE  ONLINE       0     0     0
        ata-ST8000NM000A-2KE101_WKD2MY6A  ONLINE       0     0     0
      mirror-1                            ONLINE       0     0     0
        ata-ST8000NM000A-2KE101_WKD2N2DB  ONLINE       0     0     0
        ata-ST8000NM000A-2KE101_WKD2NVWF  ONLINE       0     0     0
      mirror-2                            ONLINE       0     0     0
        ata-ST8000NM000A-2KE101_WKD2PKLH  ONLINE       0     0     0
        ata-ST8000NM000A-2KE101_WKD2PQQN  ONLINE       0     0     0
      mirror-3                            ONLINE       0     0     0
        ata-ST8000NM000A-2KE101_WKD2Q4Q6  ONLINE       0     0     0
        ata-ST8000NM000A-2KE101_WKD2SQCH  ONLINE       0     0     0
      mirror-4                            ONLINE       0     0     0
        ata-ST8000NM000A-2KE101_WKD2SQG0  ONLINE       0     0     0
        ata-ST8000NM000A-2KE101_WKD2V7EB  ONLINE       0     0     0
      mirror-5                            ONLINE       0     0     0
        ata-ST8000NM000A-2KE101_WKD2VBDD  ONLINE       0     0     0
        ata-ST8000NM000A-2KE101_WKD2W94Y  ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
config:

    NAME                                                  STATE     READ WRITE CKSUM
    rpool                                                 ONLINE       0     0     0
      mirror-0                                            ONLINE       0     0     0
        ata-INTEL_SSDSC2BB120G4_BTWL428306LR120LGN-part3  ONLINE       0     0     0
        ata-INTEL_SSDSC2BB120G4_BTWL42740501120LGN-part3  ONLINE       0     0     0

errors: No known data errors

Dunuin · Jul 6, 2023

I would at least add a mirror of enterprise SSDs (for example 2x 500 GB) as special devices so the metadata don't have to be read/written from/to the slow HDDs. Would make the GC magnitude faster and verify/backup/restore performance also a bit.

_gabriel · Jul 6, 2023

I would partition the 120GB (it seems are DC model ) with 30GB rpool for PBS os itself then another pool for metadata like @Dunuin says.
All from Shell/CLI ...
Details in another post of @Dunuin in this topic

devis · Jul 6, 2023

Do I understand correctly, it is necessary to make L2 ARC from the existing system disk right?

devis · Jul 6, 2023

Dunuin said:
I would at least add a mirror of enterprise SSDs (for example 2x 500 GB) as special devices so the metadata don't have to be read/written from/to the slow HDDs. Would make the GC magnitude faster and verify/backup/restore performance also a bit.

But after all, if I'm not mistaken, part of the metadata is in my memory due to the use of ARC

GC, in principle, I won’t say that it works for a long time, more time is spent on the verification process, and when this process goes on, I get errors when I access backups of a specific VM from the hypervisor

Dunuin · Jul 6, 2023

devis said:
But after all, if I'm not mistaken, part of the metadata is in my memory due to the use of ARC

Yes, but ARC and L2ARC are read cache only. Special devices are no cache. Without them data+metadata will be stored on the HDDs. With special devices metdata will be stored on SSDs and data on HDDs. Without special devices all those metadata writes will still hit your slow HDDs, no matter how big your ARC or L2RC are.
PBS stores everything as chunk files of max 4 MB (in practice more like 2MB because of compression). So with 96TB of backup storage you got like 48 million chunk files. When doing a GC, it needs to read+write the atime (so metadata) of all those 48 million files. Way faster when a pair of SSDs is doing the millions of random read+write IO.

devis · Sep 5, 2023

Hello everyone, I installed SSD drives and the problem disappeared. Topic can be closed

nsc117 · Oct 14, 2023

good job

_gabriel · Oct 14, 2023

devis said:
Hello everyone, I installed SSD drives and the problem disappeared. Topic can be closed

2 in Mirror as Special Devices ?

nsc117 · Oct 17, 2023

_gabriel said:
2 在镜子作为特殊设备？

这是官方推荐

_gabriel said:
2 in Mirror as Special Devices ?

This is the official recommendation

Search

Search

Proxmox Backup Server

devis

Member

devis

Member

devis

Member

_gabriel

Famous Member

Dunuin

Distinguished Member

devis

Member

Dunuin

Distinguished Member

_gabriel

Famous Member

devis

Member

devis

Member

Dunuin

Distinguished Member

devis

Member

nsc117

New Member

_gabriel

Famous Member

nsc117

New Member

We value your privacy