[solved] Cannot live restore VM

storelli · Oct 24, 2023

Hello,

We have a two independant PVE clusters connected to two PBS servers (one per cluster). On one cluster live restore is working properly as well as normal restore. But on the other one live-restore is always failing (on all nodes).
The only difference we can see between those two clusters is that one has been recently installed with PVE 8 (the working one) and the other one has been upgraded from the 7 version.

The PVE version is pve-manager/8.0.4/d258a813cfa6b390 (running kernel: 6.2.16-15-pve)

And the error received is :

Code:

new volume ID is 'storage:vm-999-disk-0'
rescan volumes...
VM is locked (create)
starting VM for live-restore
repository: 'USER@XXX:BackupServer', snapshot: 'vm/100000004/2023-10-24T02:27:57Z'
restoring 'drive-virtio0' to 'storage:vm-999-disk-0'
kvm: -drive file.filename=rbd:storage/vm-999-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/storage.keyring,if=none,id=drive-virtio0,discard=on,throttling.iops-read=350,throttling.iops-read-max=1000,throttling.iops-write=350,throttling.iops-write-max=1000,format=alloc-track,file.driver=rbd,cache=none,aio=io_uring,file.detect-zeroes=unmap,backing=drive-virtio0-pbs,auto-remove=on: warning: RBD options encoded in the filename as keyvalue pairs is deprecated
restore-drive-virtio0: transferred 0.0 B of 20.0 GiB (0.00%) in 0s
restore-drive-virtio0: Cancelling block job
An error occurred during live-restore: block job (stream) error: VM 999 not running

trying to acquire lock...
 OK
error before or during data restore, some or all disks were not completely restored. VM 999 state is NOT cleaned up.
live-restore failed

We have tried to gather more information by looking at the qmrestore perl code and what we can say is :
- the VM is launching properly and stay alive if we put some sleep in the code.
- in the QemuServer.pm in sub pbs_live_restore there is those two lines of code

Code:

mon_cmd($vmid, 'cont');
qemu_drive_mirror_monitor($vmid, undef, $jobs, 'auto', 0, 'stream');

If we sleep before mon_cmd() the VM is still running according to linux "ps aux". After this line the process disappear.

- In the QemuServer/Helpers.pm sub vm_running_locally. If we had some debug to have the function look like this

Code:

sub vm_running_locally {
    my ($vmid) = @_;

    my $pidfile = pidfile_name($vmid);

print "entering vm_running_locally\n";
    if (my $fd = IO::File->new("<$pidfile")) {
        my $st = stat($fd);
        my $line = <$fd>;
        close($fd);

        my $mtime = $st->mtime;
        if ($mtime > time()) {
            warn "file '$pidfile' modified in future\n";
        }
        if ($line =~ m/^(\d+)$/) {
print "in if (\$line =~ ...\n";
            my $pid = $1;
            my $cmdline = parse_cmdline($pid);
            if ($cmdline && defined($cmdline->{pidfile}) && $cmdline->{pidfile}->{value}
                && $cmdline->{pidfile}->{value} eq $pidfile) {
print "in if(\$cmdline...\n";
                if (my $pinfo = PVE::ProcFSTools::check_process_running($pid)) {
print "return pid $pid\n";
                    return $pid;
                }
            }
        }
    }
print "return null\n";
    return;
}

then the return trace seems to tell that the VM dies somewhere after restoring the first bytes.

Code:

kvm: -drive file.filename=rbd:storage/vm-999-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/storage.keyring,if=none,id=drive-virtio0,discard=on,throttling.iops-read=350,throttling.iops-read-max=1000,throttling.iops-write=350,throttling.iops-write-max=1000,format=alloc-track,file.driver=rbd,cache=none,aio=io_uring,file.detect-zeroes=unmap,backing=drive-virtio0-pbs,auto-remove=on: warning: RBD options encoded in the filename as keyvalue pairs is deprecated
entering vm_running_locally
in if ($line =~ ...
in if($cmdline...
return pid 3524918
entering vm_running_locally
in if ($line =~ ...
in if($cmdline...
return pid 3524918
entering vm_running_locally
in if ($line =~ ...
in if($cmdline...
return pid 3524918
entering vm_running_locally
in if ($line =~ ...
in if($cmdline...
return pid 3524918
entering vm_running_locally
in if ($line =~ ...
in if($cmdline...
return pid 3524918
restore-drive-virtio0: transferred 0.0 B of 20.0 GiB (0.00%) in 0s
entering vm_running_locally
in if ($line =~ ...
return null
restore-drive-virtio0: Cancelling block job
entering vm_running_locally
in if ($line =~ ...
return null
entering vm_running_locally
in if ($line =~ ...
return null
An error occurred during live-restore: block job (stream) error: VM 999 not running

entering vm_running_locally
in if ($line =~ ...
return null
error before or during data restore, some or all disks were not completely restored. VM 999 state is NOT cleaned up.
live-restore failed

I would be happy to provide more information on that but not sure what could be relevant or not.

Many thanks for your help.

fiona · Oct 25, 2023

Hi,
can you see anything in the system logs/journal around the time the issue occurs? Please share the VM configuration from the backup (Show Configuration in the UI) and the output of pveversion -v.

storelli · Oct 25, 2023

Hello. Thanks for your questions.

I forgot to mention that we have seen nothing special in the logs. Is there a specific log file we should look at ?

Here is the VM configuration but for information the problem seems to be with all VMs (at least all we tested).

Code:

Header
Proxmox
Virtual Environment 8.0.4
Virtual Machine 100000004 (XYZ) on node 'XYZ'
Filter VMID
Logs
()
agent: 1,fstrim_cloned_disks=1
boot: order=virtio0
cores: 1
cpu: x86-64-v3
hotplug: disk,network,usb,memory
memory: 2048
meta: creation-qemu=7.0.0,ctime=1667989325
name: XYZ
net0: virtio=C6:F0:5F:E0:BB:25,bridge=vmbr1010,firewall=1,rate=125
numa: 1
ostype: l26
protection: 1
scsihw: virtio-scsi-pci
smbios1: uuid=00dbb067-5e81-40ef-981e-d39f2162a2ad
sockets: 1
tags:
vga: qxl
virtio0: storage:vm-100000004-disk-0,discard=on,iops_rd=350,iops_rd_max=1000,iops_wr=350,iops_wr_max=1000,iothread=1,size=20G
vmgenid: e8bb0cba-6f02-4a2e-8ee9-08ee933e9d68
#qmdump#map:virtio0:drive-virtio0:silver:raw:

pveversion :

Code:

proxmox-ve: 8.0.2 (running kernel: 6.2.16-15-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-5
proxmox-kernel-6.2.16-15-pve: 6.2.16-15
proxmox-kernel-6.2: 6.2.16-15
proxmox-kernel-6.2.16-10-pve: 6.2.16-10
pve-kernel-5.15.111-1-pve: 5.15.111-1
ceph: 17.2.6-pve1+3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx5
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.26-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.9
libpve-guest-common-perl: 5.0.5
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.3-1
proxmox-backup-file-restore: 3.0.3-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.0.9
pve-cluster: 8.0.4
pve-container: 5.0.4
pve-docs: 8.0.5
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.8-2
pve-ha-manager: 4.0.2
pve-i18n: 3.0.7
pve-qemu-kvm: 8.0.2-6
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.13-pve1

fiona · Oct 25, 2023

HungryMate10 said:
I forgot to mention that we have seen nothing special in the logs. Is there a specific log file we should look at ?

If it exists, you can use /var/log/syslog, or in any case journalctl -b (best to redirect to a file to avoid cutoff for long lines) from around the time the issue happened. Sharing both is not necessary, because the should essentially contain the same information.

storelli · Oct 25, 2023

I didn't notice it yesterday but there is perhaps something interesting in the log :

Code:

Oct 24 17:37:21 ns3199628 qmrestore[3408872]: <root@pam> starting task UPID:ns3199628:003403E9:030CD98C:6537E4B1:qmrestore:999:root@pam:
Oct 24 17:37:25 ns3199628 systemd[1]: Started 999.scope.
Oct 24 17:37:25 ns3199628 kernel: device tap999i0 entered promiscuous mode
Oct 24 17:37:25 ns3199628 kernel: vmbr1010: port 10(fwpr999p0) entered blocking state
Oct 24 17:37:25 ns3199628 kernel: vmbr1010: port 10(fwpr999p0) entered disabled state
Oct 24 17:37:25 ns3199628 kernel: device fwpr999p0 entered promiscuous mode
Oct 24 17:37:25 ns3199628 kernel: vmbr1010: port 10(fwpr999p0) entered blocking state
Oct 24 17:37:25 ns3199628 kernel: vmbr1010: port 10(fwpr999p0) entered forwarding state
Oct 24 17:37:25 ns3199628 kernel: fwbr999i0: port 1(fwln999i0) entered blocking state
Oct 24 17:37:25 ns3199628 kernel: fwbr999i0: port 1(fwln999i0) entered disabled state
Oct 24 17:37:25 ns3199628 kernel: device fwln999i0 entered promiscuous mode
Oct 24 17:37:25 ns3199628 kernel: fwbr999i0: port 1(fwln999i0) entered blocking state
Oct 24 17:37:25 ns3199628 kernel: fwbr999i0: port 1(fwln999i0) entered forwarding state
Oct 24 17:37:25 ns3199628 kernel: fwbr999i0: port 2(tap999i0) entered blocking state
Oct 24 17:37:25 ns3199628 kernel: fwbr999i0: port 2(tap999i0) entered disabled state
Oct 24 17:37:25 ns3199628 kernel: fwbr999i0: port 2(tap999i0) entered blocking state
Oct 24 17:37:25 ns3199628 kernel: fwbr999i0: port 2(tap999i0) entered forwarding state
Oct 24 17:37:25 ns3199628 kernel: HTB: quantum of class 10001 is big. Consider r2q change.
Oct 24 17:37:26 ns3199628 QEMU[3409514]: qemu: qemu_mutex_unlock_impl: Operation not permitted
Oct 24 17:37:26 ns3199628 kernel: fwbr999i0: port 2(tap999i0) entered disabled state
Oct 24 17:37:26 ns3199628 kernel: fwbr999i0: port 2(tap999i0) entered disabled state
Oct 24 17:37:26 ns3199628 systemd[1]: 999.scope: Deactivated successfully.
Oct 24 17:37:27 ns3199628 qmrestore[3408873]: VM 999 qmp command failed - VM 999 not running
Oct 24 17:37:27 ns3199628 qmrestore[3408873]: VM 999 qmp command failed - VM 999 not running
Oct 24 17:37:27 ns3199628 qmrestore[3408873]: VM 999 qmp command failed - VM 999 not running
Oct 24 17:37:27 ns3199628 qmrestore[3408873]: live-restore failed
Oct 24 17:37:27 ns3199628 qmrestore[3408872]: <root@pam> end task UPID:ns3199628:003403E9:030CD98C:6537E4B1:qmrestore:999:root@pam: live-restore failed

fiona · Oct 25, 2023

HungryMate10 said:
Oct 24 17:37:26 ns3199628 QEMU[3409514]: qemu: qemu_mutex_unlock_impl: Operation not permitted

Yes, there is an assertion failure here. Could you install debugger and debug symbols apt install pve-qemu-kvm-dbgsym gdb and then either

HungryMate10 said:
- in the QemuServer.pm in sub pbs_live_restore there is those two lines of code
Code:

mon_cmd($vmid, 'cont'); qemu_drive_mirror_monitor($vmid, undef, $jobs, 'auto', 0, 'stream');
If we sleep before mon_cmd() the VM is still running according to linux "ps aux". After this line the process disappear.
Click to expand...
1. Introduce this sleep (long enough so you can run the next command)
2. Attach GDB to the VM with with gdb --ex 'set pagination off' --ex 'handle SIGUSR1 noprint nostop' --ex 'handle SIGPIPE noprint nostop' --ex 'c' -p $(cat /var/run/qemu-server/999.pid)
3. Wait for the sleep to finish and the failure
4. Run t a a bt in GDB to obtain the backtraces

Install systemd-coredump. Trigger the failure. Now there should be a coredump and you can do

Code:

cd /var/lib/systemd/coredump/
ls
# replace <XYZ> with what you actually have
zstd -d core.<XYZ>.zst
gdb kvm core.<XYZ>
# In GDB, run 't a a bt'

fiona · Oct 25, 2023

I think I was able to reproduce the issue now with a similar configuration. It seems to be the combination of iothread and the IO limits on the disk. Might be some kind of race during setup of the stream block job that's used to copy the contents from the backup image in the background during live restore. Will investigate further. Thank you for the report!

storelli · Oct 25, 2023

You're welcome.
Thanks for your investigation and the future patch.

I confirm that if I remove the IO limits on the VM and create a new backup without this options then the live restore work properly.

I would be happy to participate in the testing of the patch if needed.

fiona · Nov 3, 2023

I cannot reproduce the issue anymore with the upcoming QEMU 8.1 package, even when setting the iops limits to 10 instead. The package is not yet public unfortunately, but very likely will land in the pvetest repository during the coming week.

storelli · Nov 7, 2023

Thanks a lot. I'll check from time to time to see if available and come back here if problem is not solved.

fiona · Nov 10, 2023

FYI, the QEMU 8.1 package is available for testing now: https://forum.proxmox.com/threads/qemu-8-1-available-on-pvetest-as-of-now.136227/

storelli · Nov 13, 2023

Thanks for the information.
I have just tested with our test environment and confirm that I cannot reproduce this issue with qemu 8.1.

[solved] Cannot live restore VM

storelli

Member

fiona

Proxmox Staff Member

storelli

Member

fiona

Proxmox Staff Member

storelli

Member

fiona

Proxmox Staff Member

fiona

Proxmox Staff Member

storelli

Member

fiona

Proxmox Staff Member

storelli

Member

fiona

Proxmox Staff Member

storelli

Member

We value your privacy