"File restore" in PVE GUI is not listing content of disk

Feb 9, 2022
17
2
8
64
Daily backups scheduled in PVE to PBS.
pve-manager/7.1-6/4e61e21c (running kernel: 5.13.19-1-pve)
proxmox-backup-server 2.1.4-1 running version: 2.1.4

I've only seen this on disks with btrfs, disk with xfs seems ok for all backups on all vm's.

- All is good for all backups on many vm's; "File restore" | expand "drive-scsi1.img.fidx" | expand "raw" and content is listed and download possible.
- At least one vm; "File restore", is ok for one backup, not possible to expand disk and list content in other backups.
- At least one vm; "File restore", is not possible to expand disk in any available backups.

This is kind of a show stopper for PBS in our environment.

Solution, or a way to troubleshoot would be appreciated.
/Bengt
 
Do you have a screenshot to illustrate to problem?
Does it work from the command line?
 
Were the files written just before the backup? It could be that they were not synced to the disk in time. But that's just a guess.
 
The file system contains ca. 23 GB and the majority is static files. For this vm it varies in which backup the disk content is available. Sometimes the oldest, the newest or in between. Never seen more than one (7 days retention).
 
can you post the log from the file-restore vm? should be under '/var/log/proxmox-backup/file-restore/qemu.log'
 
is it always the same snapshots that fail? or does it fail only sometimes for those snapshots?
 
If it fails, it fails for the whole lifetime of the snapshot.
If it works, it works for the whole lifetime of the snapshot.
And, as mentioned, I have vm's where all snapshots are ok, and I have at least one where no snapshots works.
/Bengt
 
if you do a "normal" restore of such a snapshot, does the disk contain the data you want? (you can restore to a different vmid temporarily)
 
are you doing the backup with guest agent (freeze/thaw)? could you post a backup task log? my guess is that the btrfs is (sometimes) inconsistent at the time of the backup, and thus is not mountable ro the way we attempt to in the file-restore VM..
 
Hi,
Yes, we use guest agent to perform backups. We use hook script to perform database write suspend/resume. It seems to work ok but depending on db activity it can take some time.
Restoring a backup where disk content is not shown in GUI: vm not booting due to btrfs errors, as if no fs-freeze were done.
Restoring a backup where disk content is shown in GUI: vm boots ok and all is good.
Attached is a file showing snippets of the backup tasks for failing and ok restore (both looks ok) and a backup task from today which has fs-thaw error, probably due to hook script db write suspend/resume taking to long.
/Bengt
 

Attachments

Hi,
Yes, we use guest agent to perform backups. We use hook script to perform database write suspend/resume. It seems to work ok but depending on db activity it can take some time.
thanks!
Restoring a backup where disk content is not shown in GUI: vm not booting due to btrfs errors, as if no fs-freeze were done.
Restoring a backup where disk content is shown in GUI: vm boots ok and all is good.
could you give details about the 'not booting due to btrfs errors' part? which errors does it show? any manual recovery that works? in contrast to the file-restore VM (where we only have read-only access), a regular restore and boot would has the disk available for writing, so any replaying/automatic recovery of an inconsistent state should work..

also the kernel version used inside the VM might be interesting.
 
We were able to manually recover /dev/sda2 (root) by "btrfs rescue chunk-recover"
Same type of errors on /dev/sdd, tried "btrfs rescue chunk-recover " to recover from that but gave up after 5 attempts where "wanted" increased by one after each run. It's a test so no panic.
BTRFS error (device sdd): parent transid verify failed on 163938304 wanted 1499040 found 1499039
> btrfs rescue chunk-recover /dev/sdd
BTRFS error (device sdd): parent transid verify failed on 163938304 wanted 1499041 found 1499039

The vm kernel is 5.3.18-59.37-default
 
one more question, are your disks independent single-disk btrfs filesystems or raided somehow?
 
Hypervisor storage is an external ceph cluster, version 16.2.6.

Storage config:
content images,rootdir
krbd 0
pool abc

VM config:
agent: 1
bootdisk: scsi0
cores: 10
cpu: Broadwell
hotplug: disk,network,usb,memory,cpu
ide2: none,media=cdrom
memory: 1033216
name: host-name
net0: virtio=BE:B5:57:94:87:2E,bridge=vmbr0,firewall=1,tag=378
numa: 1
ostype: l26
scsi0: abc:vm-1786-disk-0,cache=writeback,discard=on,size=10G,ssd=1
scsi1: abc:vm-1786-program,cache=writeback,discard=on,size=200G,ssd=1
scsi2: abc:vm-1786-dbdata,cache=writeback,discard=on,size=1T,ssd=1
scsi3: abc:vm-1786-dblog,cache=writeback,discard=on,size=500G,ssd=1
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=b62e42e7-0c32-4cdc-af39-181ab792d48f
sockets: 4
vcpus: 32
vmgenid: f761834f-ec9c-45f6-a1ed-6abafc0e582f
 
pveversion -v would also be great! thanks.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!