Hello,
we are investigating "hanging" backup jobs in our environment. We have 7 PVE nodes which use 1 PBS server and sometimes backup jobs hang on "Waiting for server to finish backup validation..."
Configuration of PBS:
- AMD EPYC 7313P
- MDADM RAID5 from 12x Samsung PM9A3 15,36TB (NVMe)
- stripe_cache_size = 8192
- group_thread_cnt = 8
- using raw LVM
- mounted 75TB logical volume with ext4
- server is in same network as PVE, no firewall
Size of the datastore 720 groups, 7 days retention, On-Disk usage: 41.895 TiB and On-Disk chunks: 19077285.
As you can see, there is no problem with "write" or networking but on the "confirmation" of backup?
At this moment I`m looking for configuration sync level and chunk order. When we change sync level to none, then situation is for few backups fast but in few minutes we are at the start (probably disk cache?). How about chunk order? I read it in doc but I dont understand it well.
Please does anyone have any ideas how to tune storage server?
we are investigating "hanging" backup jobs in our environment. We have 7 PVE nodes which use 1 PBS server and sometimes backup jobs hang on "Waiting for server to finish backup validation..."
Configuration of PBS:
- AMD EPYC 7313P
- MDADM RAID5 from 12x Samsung PM9A3 15,36TB (NVMe)
- stripe_cache_size = 8192
- group_thread_cnt = 8
- using raw LVM
- mounted 75TB logical volume with ext4
- server is in same network as PVE, no firewall
Size of the datastore 720 groups, 7 days retention, On-Disk usage: 41.895 TiB and On-Disk chunks: 19077285.
Code:
Part of backup log:
INFO: 43% (5.4 GiB of 12.4 GiB) in 6s, read: 945.3 MiB/s, write: 150.7 MiB/s
INFO: 65% (8.2 GiB of 12.4 GiB) in 9s, read: 953.3 MiB/s, write: 132.0 MiB/s
INFO: 91% (11.4 GiB of 12.4 GiB) in 12s, read: 1.1 GiB/s, write: 89.3 MiB/s
INFO: 100% (12.4 GiB of 12.4 GiB) in 15s, read: 361.3 MiB/s, write: 53.3 MiB/s
INFO: Waiting for server to finish backup validation...
INFO: backup is sparse: 1.25 GiB (10%) total zero data
INFO: backup was done incrementally, reused 28.34 GiB (94%)
INFO: transferred 12.45 GiB in 123 seconds (103.6 MiB/s)
As you can see, there is no problem with "write" or networking but on the "confirmation" of backup?
At this moment I`m looking for configuration sync level and chunk order. When we change sync level to none, then situation is for few backups fast but in few minutes we are at the start (probably disk cache?). How about chunk order? I read it in doc but I dont understand it well.
Please does anyone have any ideas how to tune storage server?