Performance issues (pbs/zfs and proxmox/ceph)

hellnation

Member
Feb 21, 2022
6
0
6
47
Hello,

I'm using Proxmox Backup Server 3 with 3 Promox nodes. (7.4)

The backup and restore performance doesn't seem great with the hardware I'm using.

The main issue is with the restore, I'm getting about 350MB/sec.

Locally on PBS on zfs, using fio, using SSD drives I'm getting random read speed of about 4500MB/sec.

And locally on the Proxmox nodes on Ceph, using rados, using NVMe drives I'm getting about 1250MB/sec

The network is 100gb.

So why am I getting only 350MB/sec when restoring? I get that there will be overhead, however something doesn't seem to add up.

Any pointers to increase the performance is welcome :)

Code:
PBS box hardware:

Proxmox Backup Server 3.0-2
Kernel Version Linux 6.2.16-15-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-15 (2023-09-28T13:53Z)
Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
128GB DDR4 RAM
2 x LSI HBA 9400-16i   
8 x Micron PRO 5400 SSD 7.6TB   
100GB network - NVIDIA Mellanox MT27800
10GB network - Intel® Ethernet Controller X540-AT2

Code:
3 x Proxmox nodes hardware:

Lenovo sr645
2 x ThinkSystem AMD EPYC 7502 32C 180W 2.5GHz Processor
1TB RAM TruDDR4 3200MHz
5 x ThinkSystem U.2 PM983 3.84TB Entry NVMe PCIe 3.0 x4 Hot Swap SSD
2 x ThinkSystem 7mm 5300 240GB Entry SATA 6Gb SSD
1 x ThinkSystem Broadcom 57454 10GBASE-T 4-port OCP Ethernet Adapter
1 x 100gb Mellanox Ethernet Adapter

Below you will find local benchmarks, CEPH and ZFS configuration.

Code:
Proxmox node

root@pve:~# rados bench -p vm_storage 10 write -b 4M -t 16 --run-name 'pvexxx' --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_pvexxx_332153
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16       312       296   1183.93      1184    0.013909   0.0481707
    2      16       663       647   1293.87      1404   0.0465274    0.048925
    3      16       953       937    1249.2      1160   0.0144783   0.0505117
    4      16      1280      1264   1263.86      1308    0.131314   0.0496239
    5      16      1630      1614   1291.05      1400   0.0178662   0.0493974
    6      16      1938      1922   1281.18      1232   0.0152664   0.0495438
    7      16      2220      2204   1259.29      1128   0.0341177   0.0504078
    8      16      2555      2539   1269.35      1340   0.0113804   0.0500438
    9      16      2830      2814   1250.52      1100   0.0165271   0.0509074
   10      15      3131      3116   1246.25      1208   0.0200455   0.0512501
Total time run:         10.0212
Total writes made:      3131
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     1249.75
Stddev Bandwidth:       110.292
Max bandwidth (MB/sec): 1404
Min bandwidth (MB/sec): 1100
Average IOPS:           312
Stddev IOPS:            27.5729
Max IOPS:               351
Min IOPS:               275
Average Latency(s):     0.0511535
Stddev Latency(s):      0.0434358
Max latency(s):         0.207021
Min latency(s):         0.00991231

Code:
CEPH config:


5 OSDs per node (15s OSD total)

root@pve:~# ceph config show osd.1
NAME                                             VALUE                                   SOURCE    OVERRIDES  IGNORES
auth_client_required                             cephx                                   file                       
auth_cluster_required                            cephx                                   file                       
auth_service_required                            cephx                                   file                       
cluster_network                                  172.xx.xx.xx/22                         file                       
daemonize                                        false                                   override                   
keyring                                          $osd_data/keyring                       default                     
leveldb_log                                                                              default                     
mon_allow_pool_delete                            true                                    file                       
mon_host                                         172.xx.xx.11 172.xx.xx.12 172.xx.xx.13  file                       
ms_bind_ipv4                                     true                                    file                       
ms_bind_ipv6                                     false                                   file                       
no_config_file                                   false                                   override                   
osd_delete_sleep                                 0.000000                                override                   
osd_delete_sleep_hdd                             0.000000                                override                   
osd_delete_sleep_hybrid                          0.000000                                override                   
osd_delete_sleep_ssd                             0.000000                                override                   
osd_max_backfills                                10                                      default                     
osd_mclock_max_capacity_iops_hdd                 0.000000                                override                   
osd_mclock_max_capacity_iops_ssd                 18970.410801                            mon                         
osd_mclock_scheduler_background_best_effort_lim  999999                                  default                     
osd_mclock_scheduler_background_best_effort_res  593                                     default                     
osd_mclock_scheduler_background_best_effort_wgt  2                                       default                     
osd_mclock_scheduler_background_recovery_lim     2371                                    default                     
osd_mclock_scheduler_background_recovery_res     593                                     default                     
osd_mclock_scheduler_background_recovery_wgt     1                                       default                     
osd_mclock_scheduler_client_lim                  999999                                  default                     
osd_mclock_scheduler_client_res                  1186                                    default                     
osd_mclock_scheduler_client_wgt                  2                                       default                     
osd_pool_default_min_size                        2                                       file                       
osd_pool_default_size                            3                                       file                       
osd_recovery_max_active                          0                                       default                     
osd_recovery_max_active_hdd                      10                                      default                     
osd_recovery_max_active_ssd                      20                                      default                     
osd_recovery_sleep                               0.000000                                override                   
osd_recovery_sleep_hdd                           0.000000                                override                   
osd_recovery_sleep_hybrid                        0.000000                                override                   
osd_recovery_sleep_ssd                           0.000000                                override                   
osd_scrub_sleep                                  0.000000                                override                   
osd_snap_trim_sleep                              0.000000                                override                   
osd_snap_trim_sleep_hdd                          0.000000                                override                   
osd_snap_trim_sleep_hybrid                       0.000000                                override                   
osd_snap_trim_sleep_ssd                          0.000000                                override                   
public_network                                   172.xx.xx.xx/22                         file                       
rbd_default_features                             61                                      default                     
rbd_qos_exclude_ops                              0                                       default                     
setgroup                                         ceph                                    cmdline                     
setuser                                          ceph                                    cmdline

PBS box FIO tests

Code:
FIO tests:

root@pbs:/storage# fio --name=rand-read --ioengine=posixaio --rw=randread --bs=4M --size=4g --numjobs=1 --iodepth=32 --runtime=60 --time_based --end_fsync=1

READ: bw=4617MiB/s (4841MB/s), 4617MiB/s-4617MiB/s (4841MB/s-4841MB/s), io=271GiB (291GB), run=60024-60024msec

root@pbs:/storage# fio --name=rand-write --ioengine=posixaio --rw=randwrite --bs=4M --size=4g --numjobs=1 --iodepth=32 --runtime=60 --time_based --end_fsync=1

WRITE: bw=253MiB/s (266MB/s), 253MiB/s-253MiB/s (266MB/s-266MB/s), io=19.9GiB (21.3GB), run=80372-80372msec

--

Code:
$ zpool get all stornado

stornado  size                           55.9T                          -
stornado  capacity                       0%                             -
stornado  altroot                        -                              default
stornado  health                         ONLINE                         -
stornado  guid                           9671548887958957898            -
stornado  version                        -                              default
stornado  bootfs                         -                              default
stornado  delegation                     on                             default
stornado  autoreplace                    off                            default
stornado  cachefile                      -                              default
stornado  failmode                       wait                           default
stornado  listsnapshots                  off                            default
stornado  autoexpand                     on                             local
stornado  dedupratio                     1.00x                          -
stornado  free                           55.5T                          -
stornado  allocated                      363G                           -
stornado  readonly                       off                            -
stornado  ashift                         12                             local
stornado  comment                        -                              default
stornado  expandsize                     -                              -
stornado  freeing                        0                              -
stornado  fragmentation                  0%                             -
stornado  leaked                         0                              -
stornado  multihost                      off                            default
stornado  checkpoint                     -                              -
stornado  load_guid                      14358635609680744971           -
stornado  autotrim                       off                            default
stornado  compatibility                  off                            default
stornado  feature@async_destroy          enabled                        local
stornado  feature@empty_bpobj            active                         local
stornado  feature@lz4_compress           active                         local
stornado  feature@multi_vdev_crash_dump  enabled                        local
stornado  feature@spacemap_histogram     active                         local
stornado  feature@enabled_txg            active                         local
stornado  feature@hole_birth             active                         local
stornado  feature@extensible_dataset     active                         local
stornado  feature@embedded_data          active                         local
stornado  feature@bookmarks              enabled                        local
stornado  feature@filesystem_limits      enabled                        local
stornado  feature@large_blocks           enabled                        local
stornado  feature@large_dnode            enabled                        local
stornado  feature@sha512                 enabled                        local
stornado  feature@skein                  enabled                        local
stornado  feature@edonr                  enabled                        local
stornado  feature@userobj_accounting     active                         local
stornado  feature@encryption             enabled                        local
stornado  feature@project_quota          active                         local
stornado  feature@device_removal         enabled                        local
stornado  feature@obsolete_counts        enabled                        local
stornado  feature@zpool_checkpoint       enabled                        local
stornado  feature@spacemap_v2            active                         local
stornado  feature@allocation_classes     enabled                        local
stornado  feature@resilver_defer         enabled                        local
stornado  feature@bookmark_v2            enabled                        local
stornado  feature@redaction_bookmarks    enabled                        local
stornado  feature@redacted_datasets      enabled                        local
stornado  feature@bookmark_written       enabled                        local
stornado  feature@log_spacemap           active                         local
stornado  feature@livelist               enabled                        local
stornado  feature@device_rebuild         enabled                        local
stornado  feature@zstd_compress          enabled                        local
stornado  feature@draid                  enabled                        local

--

Code:
root@pbs:/stornado# zpool status
  pool: rpool
 state: ONLINE
config:

        NAME                                                    STATE     READ WRITE CKSUM
        rpool                                                   ONLINE       0     0     0
          mirror-0                                              ONLINE       0     0     0
            ata-HDSTOR_-_HSAV25ST250AX_HS230811158DB1F12-part3  ONLINE       0     0     0
            ata-HDSTOR_-_HSAV25ST250AX_HS230811158DB1F10-part3  ONLINE       0     0     0

errors: No known data errors

  pool: stornado
 state: ONLINE
config:


        NAME                        STATE     READ WRITE CKSUM
        stornado                    ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0
            wwn-0x500a0751405fd5d6  ONLINE       0     0     0
            wwn-0x500a075141cdda7e  ONLINE       0     0     0
            wwn-0x500a075141cdf608  ONLINE       0     0     0
            wwn-0x500a075141cddbd9  ONLINE       0     0     0
            wwn-0x500a075141cdf6e8  ONLINE       0     0     0
            wwn-0x500a075141cddd3f  ONLINE       0     0     0
            wwn-0x500a075141cddc8d  ONLINE       0     0     0
            wwn-0x500a075141cdd9b4  ONLINE       0     0     0


errors: No known data errors
 
--
Code:
root@pbs:/stornado# zpool get all rpool
NAME   PROPERTY                       VALUE                          SOURCE
rpool  size                           230G                           -
rpool  capacity                       1%                             -
rpool  altroot                        -                              default
rpool  health                         ONLINE                         -
rpool  guid                           18162911503987951255           -
rpool  version                        -                              default
rpool  bootfs                         rpool/ROOT/pbs-1               local
rpool  delegation                     on                             default
rpool  autoreplace                    off                            default
rpool  cachefile                      -                              default
rpool  failmode                       wait                           default
rpool  listsnapshots                  off                            default
rpool  autoexpand                     off                            default
rpool  dedupratio                     1.00x                          -
rpool  free                           227G                           -
rpool  allocated                      2.75G                          -
rpool  readonly                       off                            -
rpool  ashift                         12                             local
rpool  comment                        -                              default
rpool  expandsize                     -                              -
rpool  freeing                        0                              -
rpool  fragmentation                  1%                             -
rpool  leaked                         0                              -
rpool  multihost                      off                            default
rpool  checkpoint                     -                              -
rpool  load_guid                      8147323369962717621            -
rpool  autotrim                       off                            default
rpool  compatibility                  off                            default
rpool  feature@async_destroy          enabled                        local
rpool  feature@empty_bpobj            active                         local
rpool  feature@lz4_compress           active                         local
rpool  feature@multi_vdev_crash_dump  enabled                        local
rpool  feature@spacemap_histogram     active                         local
rpool  feature@enabled_txg            active                         local
rpool  feature@hole_birth             active                         local
rpool  feature@extensible_dataset     active                         local
rpool  feature@embedded_data          active                         local
rpool  feature@bookmarks              enabled                        local
rpool  feature@filesystem_limits      enabled                        local
rpool  feature@large_blocks           enabled                        local
rpool  feature@large_dnode            enabled                        local
rpool  feature@sha512                 enabled                        local
rpool  feature@skein                  enabled                        local
rpool  feature@edonr                  enabled                        local
rpool  feature@userobj_accounting     active                         local
rpool  feature@encryption             enabled                        local
rpool  feature@project_quota          active                         local
rpool  feature@device_removal         enabled                        local
rpool  feature@obsolete_counts        enabled                        local
rpool  feature@zpool_checkpoint       enabled                        local
rpool  feature@spacemap_v2            active                         local
rpool  feature@allocation_classes     enabled                        local
rpool  feature@resilver_defer         enabled                        local
rpool  feature@bookmark_v2            enabled                        local
rpool  feature@redaction_bookmarks    enabled                        local
rpool  feature@redacted_datasets      enabled                        local
rpool  feature@bookmark_written       enabled                        local
rpool  feature@log_spacemap           active                         local
rpool  feature@livelist               enabled                        local
rpool  feature@device_rebuild         enabled                        local
rpool  feature@zstd_compress          enabled                        local
rpool  feature@draid                  enabled                        local
--
Code:
root@pbs:/stornado# arc_summary


------------------------------------------------------------------------
ZFS Subsystem Report                            Mon Nov 13 16:05:17 2023
Linux 6.2.16-15-pve                                          2.1.13-pve1
Machine: pbs (x86_64)                                        2.1.13-pve1


ARC status:                                                      HEALTHY
        Memory throttle count:                                         0


ARC size (current):                                    95.9 %   98.2 GiB
        Target size (adaptive):                       100.0 %  102.4 GiB
        Min size (hard limit):                         41.3 %   42.2 GiB
        Max size (high water):                            2:1  102.4 GiB
        Most Frequently Used (MFU) cache size:         76.9 %   74.7 GiB
        Most Recently Used (MRU) cache size:           23.1 %   22.4 GiB
        Metadata cache size (hard limit):              75.0 %   76.8 GiB
        Metadata cache size (current):                  2.3 %    1.8 GiB
        Dnode cache size (hard limit):                 10.0 %    7.7 GiB
        Dnode cache size (current):                     4.5 %  353.7 MiB


ARC hash breakdown:
        Elements max:                                               1.7M
        Elements current:                              95.6 %       1.6M
        Collisions:                                                 1.1M
        Chain max:                                                     5
        Chains:                                                    69.5k


ARC misc:
        Deleted:                                                    6.8M
        Mutex misses:                                               2.8k
        Eviction skips:                                              135
        Eviction skips due to L2 writes:                               0
        L2 cached evictions:                                     0 Bytes
        L2 eligible evictions:                                   1.1 TiB
        L2 eligible MFU evictions:                     13.6 %  160.0 GiB
        L2 eligible MRU evictions:                    86.4 %  1012.7 GiB
        L2 ineligible evictions:                                30.7 GiB


ARC total accesses (hits + misses):                                70.6M
        Cache hit ratio:                               86.2 %      60.9M
        Cache miss ratio:                              13.8 %       9.7M
        Actual hit ratio (MFU + MRU hits):             86.2 %      60.9M
        Data demand efficiency:                        65.3 %      16.5M
        Data prefetch efficiency:                       4.7 %       2.8M


Cache hits by cache type:
        Most frequently used (MFU):                    76.4 %      46.5M
        Most recently used (MRU):                      23.6 %      14.4M
        Most frequently used (MFU) ghost:               1.5 %     916.5k
        Most recently used (MRU) ghost:                 0.8 %     499.7k


Cache hits by data type:
        Demand data:                                   17.7 %      10.8M
        Prefetch data:                                  0.2 %     131.7k
        Demand metadata:                               82.1 %      50.0M
        Prefetch metadata:                            < 0.1 %      10.6k


Cache misses by data type:
        Demand data:                                   58.6 %       5.7M
        Prefetch data:                                 27.7 %       2.7M
        Demand metadata:                               13.5 %       1.3M
        Prefetch metadata:                              0.2 %      23.1k


DMU prefetch efficiency:                                            1.4M
        Hit ratio:                                     38.7 %     535.8k
        Miss ratio:                                    61.3 %     847.5k


L2ARC not detected, skipping section


Solaris Porting Layer (SPL):
        spl_hostid                                                     0
        spl_hostid_path                                      /etc/hostid
        spl_kmem_alloc_max                                       1048576
        spl_kmem_alloc_warn                                        65536
        spl_kmem_cache_kmem_threads                                    4
        spl_kmem_cache_magazine_size                                   0
        spl_kmem_cache_max_size                                       32
        spl_kmem_cache_obj_per_slab                                    8
        spl_kmem_cache_reclaim                                         0
        spl_kmem_cache_slab_limit                                  16384
        spl_max_show_tasks                                           512
        spl_panic_halt                                                 0
        spl_schedule_hrtimeout_slack_us                                0
        spl_taskq_kick                                                 0
        spl_taskq_thread_bind                                          0
        spl_taskq_thread_dynamic                                       1
        spl_taskq_thread_priority                                      1
        spl_taskq_thread_sequential                                    4


Tunables:
        dbuf_cache_hiwater_pct                                        10
        dbuf_cache_lowater_pct                                        10
        dbuf_cache_max_bytes                        18446744073709551615
        dbuf_cache_shift                                               5
        dbuf_metadata_cache_max_bytes               18446744073709551615
        dbuf_metadata_cache_shift                                      6
        dmu_object_alloc_chunk_shift                                   7
        dmu_prefetch_max                                       134217728
        ignore_hole_birth                                              1
        l2arc_exclude_special                                          0
        l2arc_feed_again                                               1
        l2arc_feed_min_ms                                            200
        l2arc_feed_secs                                                1
        l2arc_headroom                                                 2
        l2arc_headroom_boost                                         200
        l2arc_meta_percent                                            33
        l2arc_mfuonly                                                  0
        l2arc_noprefetch                                               1
        l2arc_norw                                                     0
        l2arc_rebuild_blocks_min_l2size                       1073741824
        l2arc_rebuild_enabled                                          1
        l2arc_trim_ahead                                               0
        l2arc_write_boost                                        8388608
        l2arc_write_max                                          8388608
        metaslab_aliquot                                         1048576
        metaslab_bias_enabled                                          1
        metaslab_debug_load                                            0
        metaslab_debug_unload                                          0
        metaslab_df_max_search                                  16777216
        metaslab_df_use_largest_segment                                0
        metaslab_force_ganging                                  16777217
        metaslab_fragmentation_factor_enabled                          1
        metaslab_lba_weighting_enabled                                 1
        metaslab_preload_enabled                                       1
        metaslab_unload_delay                                         32
        metaslab_unload_delay_ms                                  600000
        send_holes_without_birth_time                                  1
        spa_asize_inflation                                           24
        spa_config_path                             /etc/zfs/zpool.cache
        spa_load_print_vdev_tree                                       0
        spa_load_verify_data                                           1
        spa_load_verify_metadata                                       1
        spa_load_verify_shift                                          4
        spa_slop_shift                                                 5
        vdev_file_logical_ashift                                       9
        vdev_file_physical_ashift                                      9
        vdev_removal_max_span                                      32768
        vdev_validate_skip                                             0
        zap_iterate_prefetch                                           1
        zfetch_array_rd_sz                                       1048576
        zfetch_max_distance                                     67108864
        zfetch_max_idistance                                    67108864
        zfetch_max_sec_reap                                            2
        zfetch_max_streams                                             8
        zfetch_min_distance                                      4194304
        zfetch_min_sec_reap                                            1
        zfs_abd_scatter_enabled                                        1
        zfs_abd_scatter_max_order                                     10
        zfs_abd_scatter_min_size                                    1536
        zfs_admin_snapshot                                             0
        zfs_allow_redacted_dataset_mount                               0
        zfs_arc_average_blocksize                                   8192
        zfs_arc_dnode_limit                                            0
        zfs_arc_dnode_limit_percent                                   10
        zfs_arc_dnode_reduce_percent                                  10
        zfs_arc_evict_batch_limit                                     10
        zfs_arc_eviction_pct                                         200
        zfs_arc_grow_retry                                             0
        zfs_arc_lotsfree_percent                                      10
        zfs_arc_max                                         109951162778
        zfs_arc_meta_adjust_restarts                                4096
        zfs_arc_meta_limit                                             0
        zfs_arc_meta_limit_percent                                    75
        zfs_arc_meta_min                                               0
        zfs_arc_meta_prune                                         10000
        zfs_arc_meta_strategy                                          1
        zfs_arc_min                                          45354854646
        zfs_arc_min_prefetch_ms                                        0
        zfs_arc_min_prescient_prefetch_ms                              0
        zfs_arc_p_dampener_disable                                     1
        zfs_arc_p_min_shift                                            0
        zfs_arc_pc_percent                                             0
        zfs_arc_prune_task_threads                                     1
        zfs_arc_shrink_shift                                           0
        zfs_arc_shrinker_limit                                     10000
 
Code:
        zfs_arc_sys_free                                               0
        zfs_async_block_max_blocks                  18446744073709551615
        zfs_autoimport_disable                                         1
        zfs_btree_verify_intensity                                     0
        zfs_checksum_events_per_second                                20
        zfs_commit_timeout_pct                                         5
        zfs_compressed_arc_enabled                                     1
        zfs_condense_indirect_commit_entry_delay_ms                    0
        zfs_condense_indirect_obsolete_pct                            25
        zfs_condense_indirect_vdevs_enable                             1
        zfs_condense_max_obsolete_bytes                       1073741824
        zfs_condense_min_mapping_bytes                            131072
        zfs_dbgmsg_enable                                              1
        zfs_dbgmsg_maxsize                                       4194304
        zfs_dbuf_state_index                                           0
        zfs_ddt_data_is_special                                        1
        zfs_deadman_checktime_ms                                   60000
        zfs_deadman_enabled                                            1
        zfs_deadman_failmode                                        wait
        zfs_deadman_synctime_ms                                   600000
        zfs_deadman_ziotime_ms                                    300000
        zfs_dedup_prefetch                                             0
        zfs_default_bs                                                 9
        zfs_default_ibs                                               17
        zfs_delay_min_dirty_percent                                   60
        zfs_delay_scale                                           500000
        zfs_delete_blocks                                          20480
        zfs_dirty_data_max                                    4294967296
        zfs_dirty_data_max_max                                4294967296
        zfs_dirty_data_max_max_percent                                25
        zfs_dirty_data_max_percent                                    10
        zfs_dirty_data_sync_percent                                   20
        zfs_disable_ivset_guid_check                                   0
        zfs_dmu_offset_next_sync                                       1
        zfs_embedded_slog_min_ms                                      64
        zfs_expire_snapshot                                          300
        zfs_fallocate_reserve_percent                                110
        zfs_flags                                                      0
        zfs_free_bpobj_enabled                                         1
        zfs_free_leak_on_eio                                           0
        zfs_free_min_time_ms                                        1000
        zfs_history_output_max                                   1048576
        zfs_immediate_write_sz                                     32768
        zfs_initialize_chunk_size                                1048576
        zfs_initialize_value                        16045690984833335022
        zfs_keep_log_spacemaps_at_export                               0
        zfs_key_max_salt_uses                                  400000000
        zfs_livelist_condense_new_alloc                                0
        zfs_livelist_condense_sync_cancel                              0
        zfs_livelist_condense_sync_pause                               0
        zfs_livelist_condense_zthr_cancel                              0
        zfs_livelist_condense_zthr_pause                               0
        zfs_livelist_max_entries                                  500000
        zfs_livelist_min_percent_shared                               75
        zfs_lua_max_instrlimit                                 100000000
        zfs_lua_max_memlimit                                   104857600
        zfs_max_async_dedup_frees                                 100000
        zfs_max_log_walking                                            5
        zfs_max_logsm_summary_length                                  10
        zfs_max_missing_tvds                                           0
        zfs_max_nvlist_src_size                                        0
        zfs_max_recordsize                                      16777216
        zfs_metaslab_find_max_tries                                  100
        zfs_metaslab_fragmentation_threshold                          70
        zfs_metaslab_max_size_cache_sec                             3600
        zfs_metaslab_mem_limit                                        25
        zfs_metaslab_segment_weight_enabled                            1
        zfs_metaslab_switch_threshold                                  2
        zfs_metaslab_try_hard_before_gang                              0
        zfs_mg_fragmentation_threshold                                95
        zfs_mg_noalloc_threshold                                       0
        zfs_min_metaslabs_to_flush                                     1
        zfs_multihost_fail_intervals                                  10
        zfs_multihost_history                                          0
        zfs_multihost_import_intervals                                20
        zfs_multihost_interval                                      1000
        zfs_multilist_num_sublists                                     0
        zfs_no_scrub_io                                                0
        zfs_no_scrub_prefetch                                          0
        zfs_nocacheflush                                               0
        zfs_nopwrite_enabled                                           1
        zfs_object_mutex_size                                         64
        zfs_obsolete_min_time_ms                                     500
        zfs_override_estimate_recordsize                               0
        zfs_pd_bytes_max                                        52428800
        zfs_per_txg_dirty_frees_percent                               30
        zfs_prefetch_disable                                           0
        zfs_read_history                                               0
        zfs_read_history_hits                                          0
        zfs_rebuild_max_segment                                  1048576
        zfs_rebuild_scrub_enabled                                      1
        zfs_rebuild_vdev_limit                                  67108864
        zfs_reconstruct_indirect_combinations_max                   4096
        zfs_recover                                                    0
        zfs_recv_queue_ff                                             20
        zfs_recv_queue_length                                   16777216
        zfs_recv_write_batch_size                                1048576
        zfs_removal_ignore_errors                                      0
        zfs_removal_suspend_progress                                   0
        zfs_remove_max_segment                                  16777216
        zfs_resilver_disable_defer                                     0
        zfs_resilver_min_time_ms                                    3000
        zfs_scan_blkstats                                              0
        zfs_scan_checkpoint_intval                                  7200
        zfs_scan_fill_weight                                           3
        zfs_scan_ignore_errors                                         0
        zfs_scan_issue_strategy                                        0
        zfs_scan_legacy                                                0
        zfs_scan_max_ext_gap                                     2097152
        zfs_scan_mem_lim_fact                                         20
        zfs_scan_mem_lim_soft_fact                                    20
        zfs_scan_report_txgs                                           0
        zfs_scan_strict_mem_lim                                        0
        zfs_scan_suspend_progress                                      0
        zfs_scan_vdev_limit                                     16777216
        zfs_scrub_min_time_ms                                       1000
        zfs_send_corrupt_data                                          0
        zfs_send_no_prefetch_queue_ff                                 20
        zfs_send_no_prefetch_queue_length                        1048576
        zfs_send_queue_ff                                             20
        zfs_send_queue_length                                   16777216
        zfs_send_unmodified_spill_blocks                               1
        zfs_slow_io_events_per_second                                 20
        zfs_spa_discard_memory_limit                            16777216
        zfs_special_class_metadata_reserve_pct                        25
        zfs_sync_pass_deferred_free                                    2
        zfs_sync_pass_dont_compress                                    8
        zfs_sync_pass_rewrite                                          2
        zfs_sync_taskq_batch_pct                                      75
        zfs_traverse_indirect_prefetch_limit                          32
        zfs_trim_extent_bytes_max                              134217728
        zfs_trim_extent_bytes_min                                  32768
        zfs_trim_metaslab_skip                                         0
        zfs_trim_queue_limit                                          10
        zfs_trim_txg_batch                                            32
        zfs_txg_history                                              100
        zfs_txg_timeout                                                5
        zfs_unflushed_log_block_max                               131072
        zfs_unflushed_log_block_min                                 1000
        zfs_unflushed_log_block_pct                                  400
        zfs_unflushed_log_txg_max                                   1000
        zfs_unflushed_max_mem_amt                             1073741824
        zfs_unflushed_max_mem_ppm                                   1000
        zfs_unlink_suspend_progress                                    0
        zfs_user_indirect_is_special                                   1
        zfs_vdev_aggregate_trim                                        0
        zfs_vdev_aggregation_limit                               1048576
        zfs_vdev_aggregation_limit_non_rotating                   131072
        zfs_vdev_async_read_max_active                                 3
        zfs_vdev_async_read_min_active                                 1
        zfs_vdev_async_write_active_max_dirty_percent                 60
        zfs_vdev_async_write_active_min_dirty_percent                 30
        zfs_vdev_async_write_max_active                               10
        zfs_vdev_async_write_min_active                                2
        zfs_vdev_cache_bshift                                         16
        zfs_vdev_cache_max                                         16384
        zfs_vdev_cache_size                                            0
        zfs_vdev_default_ms_count                                    200
        zfs_vdev_default_ms_shift                                     29
        zfs_vdev_initializing_max_active                               1
        zfs_vdev_initializing_min_active                               1
        zfs_vdev_max_active                                         1000
        zfs_vdev_max_auto_ashift                                      14
        zfs_vdev_min_auto_ashift                                       9
        zfs_vdev_min_ms_count                                         16
        zfs_vdev_mirror_non_rotating_inc                               0
        zfs_vdev_mirror_non_rotating_seek_inc                          1
        zfs_vdev_mirror_rotating_inc                                   0
        zfs_vdev_mirror_rotating_seek_inc                              5
        zfs_vdev_mirror_rotating_seek_offset                     1048576
        zfs_vdev_ms_count_limit                                   131072
        zfs_vdev_nia_credit                                            5
        zfs_vdev_nia_delay                                             5
        zfs_vdev_open_timeout_ms                                    1000
        zfs_vdev_queue_depth_pct                                    1000
        zfs_vdev_raidz_impl cycle [fastest] original scalar sse2 ssse3 avx2 avx512f avx512bw
        zfs_vdev_read_gap_limit                                    32768
        zfs_vdev_rebuild_max_active                                    3
        zfs_vdev_rebuild_min_active                                    1
        zfs_vdev_removal_max_active                                    2
        zfs_vdev_removal_min_active                                    1
        zfs_vdev_scheduler                                        unused
        zfs_vdev_scrub_max_active                                      3
        zfs_vdev_scrub_min_active                                      1
        zfs_vdev_sync_read_max_active                                 10
        zfs_vdev_sync_read_min_active                                 10
        zfs_vdev_sync_write_max_active                                10
        zfs_vdev_sync_write_min_active                                10
        zfs_vdev_trim_max_active                                       2
        zfs_vdev_trim_min_active                                       1
        zfs_vdev_write_gap_limit                                    4096
        zfs_vnops_read_chunk_size                                1048576
        zfs_wrlog_data_max                                    8589934592
        zfs_zevent_len_max                                           512
        zfs_zevent_retain_expire_secs                                900
        zfs_zevent_retain_max                                       2000
        zfs_zil_clean_taskq_maxalloc                             1048576
        zfs_zil_clean_taskq_minalloc                                1024
        zfs_zil_clean_taskq_nthr_pct                                 100
        zil_maxblocksize                                          131072
        zil_min_commit_timeout                                      5000
        zil_nocacheflush                                               0
        zil_replay_disable                                             0
        zil_slog_bulk                                             786432
        zio_deadman_log_all                                            0
        zio_dva_throttle_enabled                                       1
        zio_requeue_io_start_cut_in_line                               1
        zio_slow_io_ms                                             30000
        zio_taskq_batch_pct                                           80
        zio_taskq_batch_tpq                                            0
        zvol_inhibit_dev                                               0
        zvol_major                                                   230
        zvol_max_discard_blocks                                    16384
        zvol_prefetch_bytes                                       131072
        zvol_request_sync                                              0
        zvol_threads                                                  32
        zvol_volmode                                                   1


VDEV cache disabled, skipping section


ZIL committed transactions:                                         1.2M
        Commit requests:                                           92.6k
        Flushes to stable storage:                                 92.6k
        Transactions to SLOG storage pool:            0 Bytes          0
        Transactions to non-SLOG storage pool:      564.7 MiB      87.3k
 
Code:
PBS restore performance:

open block backend for target '/dev/rbd-pve/f080135b-af3a-40c9-9836-c7cd91b785d1/vm_storage/vm-5023-disk-1'
starting to restore snapshot 'vm/5023/2023-11-09T20:58:25Z'
download and verify backup index
progress 1% (read 1073741824 bytes, zeroes = 0% (0 bytes), duration 5 sec)
progress 2% (read 2147483648 bytes, zeroes = 22% (486539264 bytes), duration 9 sec)
progress 3% (read 3221225472 bytes, zeroes = 48% (1556086784 bytes), duration 9 sec)
progress 4% (read 4294967296 bytes, zeroes = 61% (2625634304 bytes), duration 9 sec)
progress 5% (read 5368709120 bytes, zeroes = 48% (2625634304 bytes), duration 12 sec)
progress 6% (read 6442450944 bytes, zeroes = 40% (2625634304 bytes), duration 16 sec)
[...]
[...]
progress 96% (read 103079215104 bytes, zeroes = 47% (49140465664 bytes), duration 278 sec)
progress 97% (read 104152956928 bytes, zeroes = 47% (49140465664 bytes), duration 284 sec)
progress 98% (read 105226698752 bytes, zeroes = 46% (49257906176 bytes), duration 289 sec)
progress 99% (read 106300440576 bytes, zeroes = 47% (50327453696 bytes), duration 289 sec)
progress 100% (read 107374182400 bytes, zeroes = 47% (51401195520 bytes), duration 289 sec)
restore image complete (bytes=107374182400, duration=289.24s, speed=354.04MB/s)
rescan volumes...
TASK OK

--

Code:
PBS backup performance:


INFO: started backup task 'a9d04e59-180e-4320-8efc-dfba8ed3991b'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: scsi1: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO:   0% (1.1 GiB of 124.0 GiB) in 3s, read: 368.0 MiB/s, write: 368.0 MiB/s
INFO:   3% (4.3 GiB of 124.0 GiB) in 6s, read: 1.1 GiB/s, write: 252.0 MiB/s
INFO:   4% (5.0 GiB of 124.0 GiB) in 9s, read: 252.0 MiB/s, write: 252.0 MiB/s
INFO:   5% (6.4 GiB of 124.0 GiB) in 15s, read: 244.7 MiB/s, write: 244.7 MiB/s
INFO:   6% (7.6 GiB of 124.0 GiB) in 20s, read: 248.8 MiB/s, write: 248.8 MiB/s
INFO:  11% (14.2 GiB of 124.0 GiB) in 23s, read: 2.2 GiB/s, write: 141.3 MiB/s
INFO:  22% (28.2 GiB of 124.0 GiB) in 26s, read: 4.7 GiB/s, write: 13.3 MiB/s
INFO:  27% (33.6 GiB of 124.0 GiB) in 29s, read: 1.8 GiB/s, write: 209.3 MiB/s
[...]
[...]
INFO:  82% (101.8 GiB of 124.0 GiB) in 2m 57s, read: 203.0 MiB/s, write: 184.0 MiB/s
INFO:  83% (103.1 GiB of 124.0 GiB) in 3m 1s, read: 328.0 MiB/s, write: 171.0 MiB/s
INFO:  86% (107.0 GiB of 124.0 GiB) in 3m 6s, read: 808.8 MiB/s, write: 156.8 MiB/s
INFO:  90% (112.3 GiB of 124.0 GiB) in 3m 9s, read: 1.8 GiB/s, write: 193.3 MiB/s
INFO:  93% (116.4 GiB of 124.0 GiB) in 3m 12s, read: 1.3 GiB/s, write: 101.3 MiB/s
INFO: 100% (124.0 GiB of 124.0 GiB) in 3m 15s, read: 2.5 GiB/s, write: 16.0 MiB/s
INFO: Waiting for server to finish backup validation...
INFO: backup is sparse: 68.13 GiB (54%) total zero data
INFO: backup was done incrementally, reused 68.14 GiB (54%)
INFO: transferred 124.00 GiB in 198 seconds (641.3 MiB/s)
INFO: adding notes to backup
INFO: Finished Backup of VM 5023 (00:03:19)
INFO: Backup finished at 2023-11-09 16:01:44
INFO: Backup job finished successfully
TASK OK
 
Sure, here it is.

PBS is baremetal install

Code:
root@pbs:~# proxmox-backup-client benchmark --repository proxmox-backup
Uploaded 810 chunks in 5 seconds.
Time per request: 6186 microseconds.
TLS speed: 678.02 MB/s
SHA256 speed: 285.72 MB/s
Compression speed: 317.25 MB/s
Decompress speed: 513.92 MB/s
AES256/GCM speed: 1032.90 MB/s
Verify speed: 175.86 MB/s
┌───────────────────────────────────┬────────────────────┐
│ Name                              │ Value              │
╞═══════════════════════════════════╪════════════════════╡
│ TLS (maximal backup upload speed) │ 678.02 MB/s (55%)  │
├───────────────────────────────────┼────────────────────┤
│ SHA256 checksum computation speed │ 285.72 MB/s (14%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 compression speed    │ 317.25 MB/s (42%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 decompression speed  │ 513.92 MB/s (43%)  │
├───────────────────────────────────┼────────────────────┤
│ Chunk verification speed          │ 175.86 MB/s (23%)  │
├───────────────────────────────────┼────────────────────┤
│ AES256 GCM encryption speed       │ 1032.90 MB/s (28%) │
└───────────────────────────────────┴────────────────────┘

Code:
root@pbs:/stornado# fio --name=rand-write --ioengine=posixaio --rw=randwrite --bs=4K --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
rand-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [F(1)][100.0%][w=628KiB/s][w=157 IOPS][eta 00m:00s]
rand-write: (groupid=0, jobs=1): err= 0: pid=3475270: Tue Nov 14 13:08:16 2023
  write: IOPS=306, BW=1227KiB/s (1257kB/s)(73.9MiB/61687msec); 0 zone resets
    slat (nsec): min=709, max=84809, avg=1938.30, stdev=1292.41
    clat (usec): min=12, max=84016, avg=3167.31, stdev=7428.58
     lat (usec): min=14, max=84018, avg=3169.25, stdev=7428.67
    clat percentiles (usec):
     |  1.00th=[   15],  5.00th=[   16], 10.00th=[   17], 20.00th=[   19],
     | 30.00th=[   21], 40.00th=[   35], 50.00th=[  176], 60.00th=[  619],
     | 70.00th=[  783], 80.00th=[ 4817], 90.00th=[ 6194], 95.00th=[23987],
     | 99.00th=[31589], 99.50th=[34341], 99.90th=[46924], 99.95th=[49546],
     | 99.99th=[68682]
   bw (  KiB/s): min=  128, max= 5040, per=100.00%, avg=1238.29, stdev=1207.79, samples=119
   iops        : min=   32, max= 1260, avg=309.57, stdev=301.95, samples=119
  lat (usec)   : 20=29.21%, 50=13.33%, 100=0.29%, 250=9.70%, 500=4.55%
  lat (usec)   : 750=9.37%, 1000=11.85%
  lat (msec)   : 2=0.26%, 4=0.01%, 10=12.48%, 20=1.35%, 50=7.56%
  lat (msec)   : 100=0.04%
  cpu          : usr=0.12%, sys=0.12%, ctx=18952, majf=0, minf=22
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,18925,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1


Run status group 0 (all jobs):
  WRITE: bw=1227KiB/s (1257kB/s), 1227KiB/s-1227KiB/s (1257kB/s-1257kB/s), io=73.9MiB (77.5MB), run=61687-61687msec


Code:
root@pbs:/stornado# fio --name=rand-read --ioengine=posixaio --rw=randread --bs=4K --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
rand-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=84.3MiB/s][r=21.6k IOPS][eta 00m:00s]
rand-read: (groupid=0, jobs=1): err= 0: pid=3485077: Tue Nov 14 13:12:18 2023
  read: IOPS=19.3k, BW=75.4MiB/s (79.1MB/s)(4524MiB/60001msec)
    slat (nsec): min=626, max=149024, avg=1361.37, stdev=246.50
    clat (usec): min=6, max=32331, avg=49.91, stdev=214.79
     lat (usec): min=10, max=32332, avg=51.27, stdev=214.80
    clat percentiles (usec):
     |  1.00th=[   11],  5.00th=[   12], 10.00th=[   12], 20.00th=[   12],
     | 30.00th=[   12], 40.00th=[   12], 50.00th=[   12], 60.00th=[   13],
     | 70.00th=[   14], 80.00th=[  145], 90.00th=[  149], 95.00th=[  151],
     | 99.00th=[  157], 99.50th=[  161], 99.90th=[ 5604], 99.95th=[ 5997],
     | 99.99th=[ 6128]
   bw (  KiB/s): min= 1960, max=263656, per=99.92%, avg=77149.92, stdev=37571.31, samples=119
   iops        : min=  490, max=65914, avg=19287.51, stdev=9392.84, samples=119
  lat (usec)   : 10=0.01%, 20=77.23%, 50=0.14%, 100=0.01%, 250=22.50%
  lat (usec)   : 500=0.01%, 750=0.01%
  lat (msec)   : 2=0.01%, 10=0.12%, 20=0.01%, 50=0.01%
  cpu          : usr=4.21%, sys=4.65%, ctx=1158328, majf=0, minf=21
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1158155,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1


Run status group 0 (all jobs):
   READ: bw=75.4MiB/s (79.1MB/s), 75.4MiB/s-75.4MiB/s (79.1MB/s-79.1MB/s), io=4524MiB (4744MB), run=60001-60001msec
 
imo, cross tests needed :
try a restore to one node on its local datastore to exclude ceph.
then try a ext4 datastore on pbs then restore from, to exclude zfs of pbs.
 
I can do those tests, however the local storage on the proxmox node is not nvme, but ssd. I think I have a spare nvme drive I could test with. But as you can see from the results above, with ceph/nvme I get 1200mb/sec easily on the current storage, when testing from the node. So the issue doesn't seem to be with ceph.

On the proxmox backup server box, same thing, the local boot disk is a mirror using some OEM (HDSTOR) SSD drives I can't even find the technical specs for it. I don't have a spare micron pro 5400 ssd.

So far, you see some results that are abnormal?

In your opinion, what kind of performance I should be getting while restoring?
 
I don't know what perfs you can get, I haven't those beast servers here :confused:.
 
Code:
┌───────────────────────────────────┬────────────────────┐
│ Name                              │ Value              │
╞═══════════════════════════════════╪════════════════════╡
│ TLS (maximal backup upload speed) │ 678.02 MB/s (55%)  │
├───────────────────────────────────┼────────────────────┤
│ SHA256 checksum computation speed │ 285.72 MB/s (14%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 compression speed    │ 317.25 MB/s (42%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 decompression speed  │ 513.92 MB/s (43%)  │
├───────────────────────────────────┼────────────────────┤
│ Chunk verification speed          │ 175.86 MB/s (23%)  │
├───────────────────────────────────┼────────────────────┤
│ AES256 GCM encryption speed       │ 1032.90 MB/s (28%) │
└───────────────────────────────────┴────────────────────┘
I could be wrong, but it looks to me like your verification, checksum and compression speeds are your bottlenecks. If your disks can write at 1200MB/s but you can only feed them data at ~300MB/s it isn't going to write any faster than that. Is your CPU usage maxed out when doing these tests?
 
I don't think the 354 MB is that bad, the time of just under 300 seconds is reasonable.

There is always an incremental backup, which also means that the PBS may have to assemble parts and transfer several backups. Then the PVE also verifies the backup and the index again. This all costs performance. The PVE node doesn't have to do this when backing up, the PBS does it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!