We have a proxmox server running one VM of windows server 22 and one vm running PBS. The proxmox server occasionally filled up its 32GB disk with log files so I set up a datastore on the proxmox host and shared it out to PBS via NFS. At the same time I set up another NFS share on the proxmox host for backup storage as backing up to an external synology NAS (also via NFS) was glacial.
About 10 days ago the hard drive filled up on the proxmox host and not only did the backup fail the windows server became corrupted and wouldn't boot saying
C:\Windows\System32\config\SYSTEM was corrupted. Obviously we couldn't locate the files because it was hidden and who would ever need to restore a system file right? If windows marks it as hidden I guess the smart thing to do is not make it findable in the backups. So we did a full restore from the NAS and 30 hours later had our windows server running again. I changed the backup to only store 1 copy on the local drive and sync that to the NAS which stores the last 4 backups and then last night it happened again.
Only this time the single backup was only taking ~600GB space and apparently successfully completed at 10pm but the PBS logs share had a 300GB+ directory in it that filled the drive sometime between midnight and 7am this morning and once again the windows server is refusing to boot with a missing/corrupted C:\Windows\System32\config\SYSTEM
Restoring from the hard drive only took 49 minutes this time but this is about the 4th time an errant PBS process has completely filled the hard drive with GB's worth of logs and the second time it has destroyed a VM. At this point it's getting embarrassing as I was the one that recommended PBS as suitable for production and it's clearly not. Unfortunately I haven't been able to study the logs when they fill up as getting the server working has been the priority and I've had to delete them. This time I've set up a quota of 20GB on the logs dataset and will able to give it a fresh dataset without deleting next time it happens but wondering if anybody knows why PBS would generate 300GB of logs in one night and/or why running out of disk space ends up corrupting files on a VM?
Additional information that may or may not be pertinent is the windows server completed several windows updates when it restored from last nights backup and the server is blocked from sending SMTP so both Prpxmox and PBS have been configured to send notifications via webhooks which has been successfully sending notifications about backups, syncs and garbage collection with no indication of impending disaster.
About 10 days ago the hard drive filled up on the proxmox host and not only did the backup fail the windows server became corrupted and wouldn't boot saying
C:\Windows\System32\config\SYSTEM was corrupted. Obviously we couldn't locate the files because it was hidden and who would ever need to restore a system file right? If windows marks it as hidden I guess the smart thing to do is not make it findable in the backups. So we did a full restore from the NAS and 30 hours later had our windows server running again. I changed the backup to only store 1 copy on the local drive and sync that to the NAS which stores the last 4 backups and then last night it happened again.
Only this time the single backup was only taking ~600GB space and apparently successfully completed at 10pm but the PBS logs share had a 300GB+ directory in it that filled the drive sometime between midnight and 7am this morning and once again the windows server is refusing to boot with a missing/corrupted C:\Windows\System32\config\SYSTEM
Restoring from the hard drive only took 49 minutes this time but this is about the 4th time an errant PBS process has completely filled the hard drive with GB's worth of logs and the second time it has destroyed a VM. At this point it's getting embarrassing as I was the one that recommended PBS as suitable for production and it's clearly not. Unfortunately I haven't been able to study the logs when they fill up as getting the server working has been the priority and I've had to delete them. This time I've set up a quota of 20GB on the logs dataset and will able to give it a fresh dataset without deleting next time it happens but wondering if anybody knows why PBS would generate 300GB of logs in one night and/or why running out of disk space ends up corrupting files on a VM?
Additional information that may or may not be pertinent is the windows server completed several windows updates when it restored from last nights backup and the server is blocked from sending SMTP so both Prpxmox and PBS have been configured to send notifications via webhooks which has been successfully sending notifications about backups, syncs and garbage collection with no indication of impending disaster.