Hey everyone,
We're experiencing issues with garbage collection being slow. In some cases, this is causing backups to fail and also resulting in backups taking longer than expected to complete.
Our set up is not ideal as we're using network attached storage and not using SSD's, but it was working fine for 1-2 months and so we're unsure what is causing the issue. As mentioned, we're using network attached storage (/mnt) for the datastore, and this points to our NAS using Synology HAT5300-16T drives which is connected locally to our dedicated Proxmox Backup Server. The server itself is using dedicated hardware, but does not have any other storage other than the 2 drives used for the OS.
On the datastore, we are currently using ~60TB of storage (out of ~112TB) and GC is taking upwards of 30 days to complete. The first phase is quite quick, but the second phase seems to process ~3% a day. While GC is processing/running, our backups seemingly take 4x as long to complete, with one particular VM going from 30 minutes to 2 hours for the backup to complete. For some VM's, particularly those that are large (several TB in size), this also seems to be causing them to go offline for 1-2 minutes when a backup starts and GC is running.
I do not believe this is an issue on the dedicated server itself as it has sufficient (overkill) hardware, and we're not seeing it get anywhere near close to using all of the resources. My suspicion is that using network attached storage and slow(ish) HDD's is our problem. However, as this was working previously fine, I was expecting no issues. Could the number of chunks be causing issues now? The last GC reported 42198447 chunks.
We previously ran into this exact issue and were able to create a new datastore on a separate NAS and started storing our backups here. This NAS had very similar specs, and doing this did resolve the issue briefly. However, after 1-2 months, we started seeing the exact same issue again and so it seems we are able to replicate it. Again, my suspicion here is that the number of chunks stored is causing us issues and using network attached storage with HDD's isn't helping. However, does anyone have any suggestions on how we can resolve this without having to purchase SSD's?
We are also noticing some backups fail with the task log showing:
PBS information -
Version: 3.4.2
CPU: Intel Xeon E5-2680 (2 sockets)
RAM: 338GB
Thanks in advance
We're experiencing issues with garbage collection being slow. In some cases, this is causing backups to fail and also resulting in backups taking longer than expected to complete.
Our set up is not ideal as we're using network attached storage and not using SSD's, but it was working fine for 1-2 months and so we're unsure what is causing the issue. As mentioned, we're using network attached storage (/mnt) for the datastore, and this points to our NAS using Synology HAT5300-16T drives which is connected locally to our dedicated Proxmox Backup Server. The server itself is using dedicated hardware, but does not have any other storage other than the 2 drives used for the OS.
On the datastore, we are currently using ~60TB of storage (out of ~112TB) and GC is taking upwards of 30 days to complete. The first phase is quite quick, but the second phase seems to process ~3% a day. While GC is processing/running, our backups seemingly take 4x as long to complete, with one particular VM going from 30 minutes to 2 hours for the backup to complete. For some VM's, particularly those that are large (several TB in size), this also seems to be causing them to go offline for 1-2 minutes when a backup starts and GC is running.
I do not believe this is an issue on the dedicated server itself as it has sufficient (overkill) hardware, and we're not seeing it get anywhere near close to using all of the resources. My suspicion is that using network attached storage and slow(ish) HDD's is our problem. However, as this was working previously fine, I was expecting no issues. Could the number of chunks be causing issues now? The last GC reported 42198447 chunks.
We previously ran into this exact issue and were able to create a new datastore on a separate NAS and started storing our backups here. This NAS had very similar specs, and doing this did resolve the issue briefly. However, after 1-2 months, we started seeing the exact same issue again and so it seems we are able to replicate it. Again, my suspicion here is that the number of chunks stored is causing us issues and using network attached storage with HDD's isn't helping. However, does anyone have any suggestions on how we can resolve this without having to purchase SSD's?
We are also noticing some backups fail with the task log showing:
Code:
2025-09-22T06:02:28+10:00: backup ended and finish failed: backup ended but finished flag is not set.
2025-09-22T06:02:28+10:00: removing unfinished backup
2025-09-22T06:02:28+10:00: removing backup snapshot "/mnt/nas/ns/6Hour/vm/297/2025-09-21T20:00:01Z"
2025-09-22T06:02:28+10:00: TASK ERROR: removing backup snapshot "/mnt/nas/ns/6Hour/vm/297/2025-09-21T20:00:01Z" failed - Directory not empty (os error 39)
PBS information -
Version: 3.4.2
CPU: Intel Xeon E5-2680 (2 sockets)
RAM: 338GB
Thanks in advance
