[SOLVED] Container restore expectation management?

wbk

Renowned Member
Oct 27, 2019
249
38
68
Hi all,

Over an hour and a half ago I started a restore task of a 400 GB container backup.

Up to now, restores have been of VMs and containers in the lower tens of GB range, which would ramble for a while and then complete.

This one just 'sits there':

Code:
recovering backed-up configuration from 'mt_pbs_localbaks_online:backup/ct/104/2026-03-14T23:54:13Z'
Formatting '/var/lib/vz/images/106/vm-106-disk-0.raw', fmt=raw size=484257562624 preallocation=off
Creating filesystem with 118226944 4k blocks and 29556736 inodes
Filesystem UUID: 3a565aaa-ad3e-4692-9f43-787b322e363e
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
restoring 'mt_pbs_localbaks_online:backup/ct/104/2026-03-14T23:54:13Z' now..

The receiving container is created on the receiving host.

Disk I/O on PBS is in the 0 to 40 kb range on the root disk, 0 kb on storage for backups. Network and CPU load is also no more than background noise.

The same goes for the receiving host.

Hardware impacts expectation; it is not a powerful system (E3-1220, LVM RAID10 with SSD caching), but an upgrade from the previous Atom 525 single HDD system (that 'just worked').

This backup is from a container that used to be around 3 TB. Is PBS (very slowly) creating a list of chunks that are part of this container?
 
Hi news,

Thank you for your reaction!

The hardware I listed is PBS (1 GBit copper) , so ballpark about half the performance of your unit.

I just found the culprit: besides creating an LV for the container to live in in the assigned storage, it also creates an image in /var/lib/vz/images in the (smallish) rootfs of PVE.

It just doesn't fit, and stalled after 3 GB, having filled the rootfs.

Can I configure where this VZ-image is created? I have two other storage definitions that accept VM disks and CT volumes. I directed the restore at one of those, not at local storage. I see that another container that is on separate storage, also claims 11 GB of storage on the PVE rootfs.

Any idea/pointer at documentation?
 
Hi news,

Thanks for the pointer.

The thread discusses whether the configured tempdir is used or not, and that it is corrected by upgrading to the next minor version (all PVE 6, I'm halfway migrating from 8 to 9), with the case that resembles my situation (@agapitox) not posting back on the result.

My /etc/vzdump.conf is on defaults, with tempding and dumpding being 'DIR', and storage at 'STORAGE_ID'. Mixing and matching with the symptoms perceived, I infer that either tempdir or dumpdir is configured / defaults to /var/lib/vz

Even so, summary of relevant bits:
  • 2 containers:
    • 104: 400 GB container on LV on sdb
    • 106: 400 GB container to be restored on LV on sda
  • PVE is installed on ZFS on sdc
    • local storage has all storage types (VM, CT, templates, backup)
    • /var/lib/vz is on the same partition as /
    • there's a 11 GB file in /var/lib/vz/images/104 (that shows as 3.7 T if squinting)
    • on restore, a file is created in /var/lib/vz/images/106
The images in /var/lib/vz don't seem temporary files, but the mounted containers. Are they indexes to the block devices under containers?


Code:
# du -hs /var/lib/vz/images/*
11G     images/104
512     images/106  # after deleting the image to cancel '/ out of space'
# ls -hs /var/lib/vz/images/104/*
11G images/104/vm-104-disk-0.raw
# ls -hals /var/lib/vzimages/104/*
11G -rw-r----- 1 root root 3.7T Mar  4 19:39 images/104/vm-104-disk-0.raw

**edit** : I realize I did a test-restore to too-small storage previously on purpose, which I cancelled. The 11GB raw image with a perceived size of 3.7T must be a stale image left over from that exercise.
In that case I restored to this storage on purpose, but in the current situation I did not write to that storage.

Let me get details, remove the stale image and retry restoring to the intended storage...
 
Last edited:
As a pointer for the next person facing this issue:

  • https://pve.proxmox.com/pve-docs-8/pct.1.html#_restoring_container_backups details:
    • Create volumes for storage backed mount points on the storage provided with the storage parameter (default: local).
    • So, it would seem that I did not set the storage parameter in either the backup, or the popup in the PVE GUI, or it is not available via the GUI.
  • To kill the task after it got stuck on '/ storage full'
    • all pvenode-commands join the queue in being stuck, as did pvedaemon restart and pvam .
    • After having thusly created five stuck SSH sessions, I kill -9'd all sleeping (status D in `htop`) PVE-related tasks to limited effect, and proceeded to kill -15 everything else PVE related, which at least had the effect of throwing the web interface off kilter and make other guests unavailable, but did not do anything to get the task unstuck.
    • After having had the task sit stuck for over four hours, I called `sync` to be sure, before `shutdown -hP now`, which kicked me out of SSH, but did not reboot the server (I found, after logging in again)
    • Having initiated the shutdown, `reboot` told me `Call to Reboot failed: There's already a shutdown or sleep operation in progress`
    • Manually triggering sysrq did work though to ... get the task unstuck.
      • do not use these if not necessary:
      • echo 1 > /proc/sys/kernel/sysrq
      • echo b > /proc/sysrq-trigger
  • Is there a more elegant way to get out of such a situation?

Tasks that were sleeping:
Code:
  PID USER       PRI  NI  VIRT   RES   SHR S△ CPU% MEM%   TIME+  Command (merged)                                                                                                                                                                               1 root        20   0  165M 10380  6160 S   0.0  0.0  0:28.80 /usr/lib/systemd/systemd│/sbin/init                                                                                                                                                       1423 root 20   0  223M  116M  5520 D   0.0  0.4  2:29.01 ├─ /usr/bin/perl│pve-ha-lrm
1903602 root 20   0  257M  148M  4200 D   0.0  0.5  0:00.12 ├─ /usr/bin/perl│task UPID:verjaardag:001D0BF2:052E04E6:69B68690:vzrestore:106:root@pam:
1906645 root 20   0  220M  117M  2096 D   0.0  0.4  0:00.00 ├─ /usr/bin/perl│pvescheduler
1957264 root 20   0  227M  118M  3356 D   0.0  0.4  0:00.00 ├─ /usr/bin/perl│pvescheduler
2018265 root 20   0  172M  107M 18948 D   0.0  0.3  0:01.17 ├─ /usr/bin/perl -T /usr/bin/pvedaemon stop
2018277 root 20   0  170M  105M 19028 D   0.0  0.3  0:01.13 ├─ /usr/bin/perl /usr/bin/pvestatd stop
2018451 root 20   0  173M  108M 18800 D   0.0  0.3  0:01.16 ├─ /usr/bin/perl /usr/bin/pvescheduler stop

This host is running PVE 8.4.17.
 
As for the stray 104-image of 11 GB in /var/lib/vz/images/104: I'm not allowed to delete it, as container 104 exists.

It was not my intention for 104 to use it, and the config seems to back me up:
Code:
# pct config 104
arch: amd64
cores: 3
cpuunits: 1201
features: nesting=1
hostname: online
memory: 8512
mp0: /mnt/nc_rest,mp=/mnt/nc_rest
mp1: /mnt/nc_linh,mp=/mnt/nc_linh
net0: name=eth0,bridge=vmbr0,gw=172.26.1.1,gw6=2a10:3781:2d49:172:26:001:001::,hwaddr=B0:DE:EB:5A:26:68,ip=172.26.3.104/16,ip6=2a10:3781:2d49:172:26:003:104::/64,type=veth
onboot: 1
ostype: debian
rootfs: yunos:vm-104-disk-0,mountoptions=lazytime,size=451G
startup: order=2,up=15
swap: 1024
unprivileged: 1

The contents of local storage:
Code:
# pvesm list local
Volid                                                   Format  Type               Size VMID
local:104/vm-104-disk-0.raw                             raw     rootdir   4012573196288 104
local:backup/vzdump-lxc-103-2020_01_12-14_48_23.tar.lzo tar.lzo backup       2647188762 103
local:vztmpl/debian-11-turnkey-core_17.1-1_amd64.tar.gz tgz     vztmpl        206782882
local:vztmpl/alpine-3.18-default_20230607_amd64.tar.xz  txz     vztmpl          2983844


On the bright side: retrying the restore from GUI (after reboot) seems to restore to the correct storage. I must have clicked wrong previously
Code:
recovering backed-up configuration from 'mt_pbs_localbaks_online:backup/ct/104/2026-03-14T23:54:13Z'
  Logical volume "vm-106-disk-0" created.
Creating filesystem with 118226944 4k blocks and 29556736 inodes
Filesystem UUID: 9c973d88-8d88-4d1a-b92d-d24395f022a6
Superblock backups stored on blocks:
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
    4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
restoring 'mt_pbs_localbaks_online:backup/ct/104/2026-03-14T23:54:13Z' now..

It's recovering at a steady 50 MB/s as reported by disk (350 IOPS), and shy of 400 Mibps as reported by network. Nice enough for now.

To close with a reply to my question about expectation management:
  • Task viewer for restore tasks does not give restored chunks as backup does the other way around
    • *edit* I just saw that the task viewer on the PBS side does display the fetched chunks
  • There is no explicit feedback about the status of the task
  • If it happens at all, there is no discernible 'low load' creating a list of chunks before executing the task
  • There is visible load on CPU, storage and network
  • At 50 MB/s, a restore of 400 GB takes at least 2 hours
 
Last edited: