Proxmox-Backup-Server 2.3.1-1 stuck after backup

i could run another test where i instruct the zfs datastore to `sync=always` to see if there is a difference - this should achieve the same effect as setting the pbs sync-level to file.

but please tell me, if this is helpful in any way before i change my setup ;-)
 
see the linked commit - this was actually tested on HDDs with ZFS with little negative effect. did you customize any ZFS-related settings (like module parameters)?
 
sync=always (or file on PBS) will likely reduce throughput (a lot on systems like yours!), but there is nothing to do at the end of the backup, so you shouldn't see a "hang" there.
 
see the linked commit - this was actually tested on HDDs with ZFS with little negative effect. did you customize any ZFS-related settings (like module parameters)?

yes i did see the test results, and i would agree that this change should have only a minor impact. the real world example is different :-D

i did not change any settings on the pool, it has been carried over for many generations of openzfs, so some of the modules are now in "local" setting mode. take a look:

Code:
zpool get all unsecured
NAME       PROPERTY                       VALUE                          SOURCE
unsecured  size                           3.62T                          -
unsecured  capacity                       72%                            -
unsecured  altroot                        -                              default
unsecured  health                         ONLINE                         -
unsecured  guid                           14761943016502656793           -
unsecured  version                        -                              default
unsecured  bootfs                         -                              default
unsecured  delegation                     on                             default
unsecured  autoreplace                    off                            default
unsecured  cachefile                      none                           local
unsecured  failmode                       wait                           default
unsecured  listsnapshots                  off                            default
unsecured  autoexpand                     off                            default
unsecured  dedupratio                     1.00x                          -
unsecured  free                           1006G                          -
unsecured  allocated                      2.64T                          -
unsecured  readonly                       off                            -
unsecured  ashift                         0                              default
unsecured  comment                        -                              default
unsecured  expandsize                     -                              -
unsecured  freeing                        0                              -
unsecured  fragmentation                  10%                            -
unsecured  leaked                         0                              -
unsecured  multihost                      off                            default
unsecured  checkpoint                     -                              -
unsecured  load_guid                      13032513006772999619           -
unsecured  autotrim                       off                            default
unsecured  compatibility                  off                            default
unsecured  feature@async_destroy          enabled                        local
unsecured  feature@empty_bpobj            active                         local
unsecured  feature@lz4_compress           active                         local
unsecured  feature@multi_vdev_crash_dump  enabled                        local
unsecured  feature@spacemap_histogram     active                         local
unsecured  feature@enabled_txg            active                         local
unsecured  feature@hole_birth             active                         local
unsecured  feature@extensible_dataset     active                         local
unsecured  feature@embedded_data          active                         local
unsecured  feature@bookmarks              enabled                        local
unsecured  feature@filesystem_limits      enabled                        local
unsecured  feature@large_blocks           enabled                        local
unsecured  feature@large_dnode            enabled                        local
unsecured  feature@sha512                 enabled                        local
unsecured  feature@skein                  enabled                        local
unsecured  feature@edonr                  enabled                        local
unsecured  feature@userobj_accounting     active                         local
unsecured  feature@encryption             enabled                        local
unsecured  feature@project_quota          active                         local
unsecured  feature@device_removal         enabled                        local
unsecured  feature@obsolete_counts        enabled                        local
unsecured  feature@zpool_checkpoint       enabled                        local
unsecured  feature@spacemap_v2            active                         local
unsecured  feature@allocation_classes     enabled                        local
unsecured  feature@resilver_defer         enabled                        local
unsecured  feature@bookmark_v2            enabled                        local
unsecured  feature@redaction_bookmarks    enabled                        local
unsecured  feature@redacted_datasets      enabled                        local
unsecured  feature@bookmark_written       enabled                        local
unsecured  feature@log_spacemap           active                         local
unsecured  feature@livelist               enabled                        local
unsecured  feature@device_rebuild         enabled                        local
unsecured  feature@zstd_compress          enabled                        local
unsecured  feature@draid                  enabled                        local


and here are the current settings of the datastore (this one has been recreated multiple times for testing):

Code:
zfs get all unsecured/backup
NAME              PROPERTY              VALUE                  SOURCE
unsecured/backup  type                  filesystem             -
unsecured/backup  creation              Thu Sep 29 16:55 2022  -
unsecured/backup  used                  1.55T                  -
unsecured/backup  available             891G                   -
unsecured/backup  referenced            1.55T                  -
unsecured/backup  compressratio         1.00x                  -
unsecured/backup  mounted               yes                    -
unsecured/backup  quota                 none                   default
unsecured/backup  reservation           none                   default
unsecured/backup  recordsize            128K                   default
unsecured/backup  mountpoint            /mnt/unsecured/backup  local
unsecured/backup  sharenfs              off                    default
unsecured/backup  checksum              on                     default
unsecured/backup  compression           off                    default
unsecured/backup  atime                 on                     default
unsecured/backup  devices               on                     default
unsecured/backup  exec                  on                     default
unsecured/backup  setuid                on                     default
unsecured/backup  readonly              off                    default
unsecured/backup  zoned                 off                    default
unsecured/backup  snapdir               hidden                 default
unsecured/backup  aclmode               discard                default
unsecured/backup  aclinherit            restricted             default
unsecured/backup  createtxg             11                     -
unsecured/backup  canmount              on                     default
unsecured/backup  xattr                 on                     default
unsecured/backup  copies                1                      default
unsecured/backup  version               5                      -
unsecured/backup  utf8only              off                    -
unsecured/backup  normalization         none                   -
unsecured/backup  casesensitivity       sensitive              -
unsecured/backup  vscan                 off                    default
unsecured/backup  nbmand                off                    default
unsecured/backup  sharesmb              off                    default
unsecured/backup  refquota              none                   default
unsecured/backup  refreservation        none                   default
unsecured/backup  guid                  14419582613216684564   -
unsecured/backup  primarycache          all                    default
unsecured/backup  secondarycache        all                    default
unsecured/backup  usedbysnapshots       0B                     -
unsecured/backup  usedbydataset         1.55T                  -
unsecured/backup  usedbychildren        0B                     -
unsecured/backup  usedbyrefreservation  0B                     -
unsecured/backup  logbias               latency                default
unsecured/backup  objsetid              272                    -
unsecured/backup  dedup                 off                    default
unsecured/backup  mlslabel              none                   default
unsecured/backup  sync                  standard               default
unsecured/backup  dnodesize             legacy                 default
unsecured/backup  refcompressratio      1.00x                  -
unsecured/backup  written               1.55T                  -
unsecured/backup  logicalused           1.54T                  -
unsecured/backup  logicalreferenced     1.54T                  -
unsecured/backup  volmode               default                default
unsecured/backup  filesystem_limit      none                   default
unsecured/backup  snapshot_limit        none                   default
unsecured/backup  filesystem_count      none                   default
unsecured/backup  snapshot_count        none                   default
unsecured/backup  snapdev               hidden                 default
unsecured/backup  acltype               off                    default
unsecured/backup  context               none                   default
unsecured/backup  fscontext             none                   default
unsecured/backup  defcontext            none                   default
unsecured/backup  rootcontext           none                   default
unsecured/backup  relatime              off                    default
unsecured/backup  redundant_metadata    all                    default
unsecured/backup  overlay               on                     default
unsecured/backup  encryption            off                    default
unsecured/backup  keylocation           none                   default
unsecured/backup  keyformat             none                   default
unsecured/backup  pbkdf2iters           0                      default
unsecured/backup  special_small_blocks  0                      default

i could also run the same tests on a mirrored ZFS, underlying 2 NVMe SSDs. Will that help?
 
I was more wondering about things like transaction timeouts, ARC settings and the like. it might be interesting to monitor `zpool iostat` during the backup with syncfs enabled and/or see how it performs with sync==file
 
ah ok, gotcha. i will run those tests at the weekend and report back.

there are some special settings for ZFS which i will have a closer look also.
 
i would have a look with

zpool iostat -w 1
and
zpool iostat -lv 1
and
zpool iostat -r 1

during sync/hang phase.

maybe you have some drive slow to response or another issue.

besides pbs, there are no other sync writers active on the same storage at the same time ?