I have set up the S3 storage daDup of Tuxis on my PBS 4 machine. Created a cache directory /cache and since there is about 160 GB free on the filesystem I assumed that should be enough.
It first did not work, and Tuxis pointed me to the fix of putting a line "put-rate-limit 10" in the s3.cfg file. And that seemed to work. I was able to backup and restore from the s3 datastorage. Also for a file restore I could browse the filesystem, great!
There are a few things which doesn't make it 100% perfect, these are my findings:
- Automatically verify after the backup is not a good idea on S3 storage, it takes a long time. Better to do it using a verify job at another time.
- I have two cluster nodes, backing up all vms gives problems. Due to the fact that with two nodes, two backups can run concurrently some will fail some don't. Probably a network overload issue or something.
- When I ran a backup job for all machines on one node (which will run in succession, one after the other) the system (PBS) more or less froze after three backups. Since I had to reboot the system there was not much to find, only the message "backup close image failed: command error: stream closed because of a broken pipe"
Here some idea's, maybe already implemented:
- Flush the backup at the end, and then start a new one
- Flush the cache at the end of each backup, or maybe at the start.
- Able to set a maximum space use for the cache, to prevent the system locking up when the cache runs out. Small systems don't have the option of putting an extra disk in for cache space.
- Backup up from different nodes, make sure only one is running at the same time, maybe round-robin style.
Regards,
Albert
It first did not work, and Tuxis pointed me to the fix of putting a line "put-rate-limit 10" in the s3.cfg file. And that seemed to work. I was able to backup and restore from the s3 datastorage. Also for a file restore I could browse the filesystem, great!
There are a few things which doesn't make it 100% perfect, these are my findings:
- Automatically verify after the backup is not a good idea on S3 storage, it takes a long time. Better to do it using a verify job at another time.
- I have two cluster nodes, backing up all vms gives problems. Due to the fact that with two nodes, two backups can run concurrently some will fail some don't. Probably a network overload issue or something.
- When I ran a backup job for all machines on one node (which will run in succession, one after the other) the system (PBS) more or less froze after three backups. Since I had to reboot the system there was not much to find, only the message "backup close image failed: command error: stream closed because of a broken pipe"
Here some idea's, maybe already implemented:
- Flush the backup at the end, and then start a new one
- Flush the cache at the end of each backup, or maybe at the start.
- Able to set a maximum space use for the cache, to prevent the system locking up when the cache runs out. Small systems don't have the option of putting an extra disk in for cache space.
- Backup up from different nodes, make sure only one is running at the same time, maybe round-robin style.
Regards,
Albert