Hello,
I have a problem when backing up to a ceph cluster of spinning disks.
I have a cluster of 27 server-class nodes with 60 OSDs on a 10gig network. If I backup ~10 VM/CTs it works fine. Upping that number to ~20 the backup grinds to a halt (write bandwidth in the KB/s range) but eventually finishes.
Doing a full backup of ~70 CT/VMs hangs.. All backup tasks are stuck pruning older backups and the bandwidth to the disks is (again) in the KB/s range.
A (temporary) solution is to restart the MDS server. That gets some of the pruning unstuck, but it gets stuck again soon thereafter.
Here's the log of one of the backup jobs:
That used to work earlier this year, but started happening a few months ago after some upgrade. Current ceph version is 17.2.5.
Thanks!!
I have a problem when backing up to a ceph cluster of spinning disks.
I have a cluster of 27 server-class nodes with 60 OSDs on a 10gig network. If I backup ~10 VM/CTs it works fine. Upping that number to ~20 the backup grinds to a halt (write bandwidth in the KB/s range) but eventually finishes.
Doing a full backup of ~70 CT/VMs hangs.. All backup tasks are stuck pruning older backups and the bandwidth to the disks is (again) in the KB/s range.
A (temporary) solution is to restart the MDS server. That gets some of the pruning unstuck, but it gets stuck again soon thereafter.
Here's the log of one of the backup jobs:
Code:
INFO: starting new backup job: vzdump 163 100 102 103 104 105 106 107 108 109 110 112 113 117 119 120 122 123 124 125 126 127 128 129 131 133 134 135 136 137 138 139 140 141 143 144 147 148 149 152 153 154 155 156 157 159 160 161 162 164 165 132 167 158 166 169 168 172 173 193 171 178 180 182 130 174 175 183 185 186 188 189 190 191 200 201 202 181 195 196 199 197 192 194 101 111 115 142 145 --all 0 --compress zstd --prune-backups 'keep-daily=3,keep-monthly=3,keep-weekly=3' --quiet 1 --storage ceph --mailnotification failure --mailto george.kyriazis@intel.com --node vis-clx-00 --mode snapshot
INFO: filesystem type on dumpdir is 'ceph' -using /var/tmp/vzdumptmp3772435_106 for temporary files
INFO: Starting Backup of VM 106 (lxc)
INFO: Backup started at 2022-12-01 17:32:33
INFO: status = running
INFO: CT Name: vis-ct-clx-00
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
Logical volume "snap_vm-106-disk-0_vzdump" created.
INFO: creating vzdump archive '/mnt/pve/ceph/dump/vzdump-lxc-106-2022_12_01-17_32_33.tar.zst'
INFO: Total bytes written: 168008232960 (157GiB, 63MiB/s)
INFO: archive file size: 68.53GB
INFO: prune older backups with retention: keep-daily=3, keep-monthly=3, keep-weekly=3
That used to work earlier this year, but started happening a few months ago after some upgrade. Current ceph version is 17.2.5.
Thanks!!